Occupancy estimation in smart buildings using predictive modeling in imbalanced domains

Jiaxun Guo¹,
Manar Amayri²,
Fatma Najar¹,
Wentao Fan³ &
…
Nizar Bouguila ORCID: orcid.org/0000-0001-7224-7940¹

331 Accesses
Explore all metrics

Abstract

This paper introduces a novel approach for occupancy estimation in smart buildings. In particular, we focus on the challenging yet common situation where the amount of training data is small and imbalanced (i.e. the classes are not approximately equally represented). Our model is based on two parts namely predictive modelling, performed via the inverted Dirichlet mixture (IDMM), and an over-sampling approach that we propose. The first part, in which the main goal is to tackle the small training data problem, concerns the calculation of the predictive distribution of the IDMM by marginalizing over its parameters, with their posterior distributions, which are estimated by a Bayesian variational inference approach that we develop. Based on over-sampling, the second part can be viewed as a complement to tackling the imbalanced domains problem. Extensive experiments and simulations involving synthetic data as well as real data extracted from smart buildings sensors show the merits of our statistical framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A scaled dirichlet-based predictive model for occupancy estimation in smart buildings

Article 30 May 2024

Nonparametric user activity modelling and prediction

Article 14 March 2020

Exploring zero-training algorithms for occupancy detection based on smart meter measurements

Article 30 August 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availibility

Data could be made available on reasonable request.

Notes

It is noteworthy that when the training set become sufficiently large, the posterior distribution variance decreases and then the predictive distribution, which can be viewed as an average over model’s parameters (Snelson and Ghahramani 2005), could be approximated by $f({\textbf {x}}|{\hat{\theta }})$, where ${\hat{\theta }}$ is a point estimate (obtained via maximum a posteriori or expectation propagation or variational Bayes, for instance) (Gelman et al. 1996; Sinharay and Stern 2003) .
Data generation techniques themselves can be categorized into two groups (Branco et al. 2016). The first group of approaches introduces perturbations (i.e. producing noisy replicates of existing data). The second group is based on interpolating existing data. Our approach belongs to the second group.

References

Ahmad J, Larijani H, Emmanuel R, Mannion M, Javed A (2021) Occupancy detection in non-residential buildings: a survey and novel privacy preserved occupancy monitoring solution. Appl Comput Inf 17(2):279–295
Google Scholar
Alawneh L, Alsarhan T, Al-Zinati M, Al-Ayyoub M, Jararweh Y, Lu H (2021) Enhancing human activity recognition using deep learning and time series augmented data. J Ambient Intell Humaniz Comput 12(12):10565–10580
Google Scholar
Amayri M, Arora A, Ploix S, Bandhyopadyay S, Ngo QD, Badarla VR (2016) Estimating occupancy in heterogeneous sensor environment. Energy Build 129:46–58
Google Scholar
Amayri M, Ploix S, Bouguila N, Wurtz F (2019) Estimating occupancy using interactive learning with a sensor environment: real-time experiments. IEEE Access 7:53932–53944
Google Scholar
Bdiri T, Bouguila N (2011) Learning inverted dirichlet mixtures for positive data clustering. In: Kuznetsov SO, Slezak D, Hepting DH, Mirkin BG (eds) Rough sets, fuzzy sets, data mining and granular computing - 13th international conference, RSFDGrC 2011, Moscow, Russia, June 25-27, 2011. Proceedings. Springer, Lecture Notes in Computer Science, Berlin, pp 265–272
Google Scholar
Bdiri T, Bouguila N (2011a) Neural information processing - 18th international conference, ICONIP 2011, Shanghai, China, november 13-17, 2011, proceedings. In: Lu B, Zhang L, Kwok JT (eds) Neural Information Processing, vol 7063. Springer, Lecture Notes in Computer Science, Berlin, pp 71–78
Google Scholar
Bdiri T, Bouguila N (2012) Positive vectors clustering using inverted dirichlet finite mixture models. Expert Syst Appl 39(2):1869–1882
Google Scholar
Bdiri T, Bouguila N (2013) Bayesian learning of inverted dirichlet mixtures for SVM kernels generation. Neural Comput Appl 23(5):1443–1458
Google Scholar
Benmansour A, Bouchachia A, Feham M (2016) Multioccupant activity recognition in pervasive smart home environments. ACM Comput Surv 48(3):34:1-34:36
Google Scholar
Bentouati B, Khelifi A, Shaheen AM, El-Sehiemy RA (2021) An enhanced moth-swarm algorithm for efficient energy management based multi dimensions OPF problem. J Ambient Intell Humaniz Comput 12(10):9499–9519
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Information science and statistics. Springer, New York
MATH Google Scholar
Bjornstad JF (1990) Predictive likelihood: a review. Stat Sci 5(2):242–254
MathSciNet MATH Google Scholar
Bouguila N, Fan W (2020) Mixture models and applications. Springer, Berlin
MATH Google Scholar
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49(2):1–50
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357
MATH Google Scholar
Da oca S, Hong T, Langevin J (2018) The human dimensions of energy use in buildings: A review. Renew Sustain Energy Rev 81:731–742
Google Scholar
Diethe T, Twomey N, Flach PA (2016) Active transfer learning for activity recognition. In: 24th European Symposium on Artificial Neural Networks, ESANN
Djenouri D, Laidi R, Djenouri Y, Balasingham I (2019) Machine learning for smart building applications: review and taxonomy. ACM Comput Surv 52(2):1–36
Google Scholar
Fan W, Bouguila N (2020) Spherical data clustering and feature selection through nonparametric Bayesian mixture models with von mises distributions. Eng Appl Artif Intell 94:103781
Google Scholar
Fan W, Bouguila N, Du J, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Trans Neural Netw Learning Syst 30(6):1683–1694
MathSciNet Google Scholar
Fan W, Yang L, Bouguila N (2021) Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3128271
Article Google Scholar
Gelman A, Li MX, Stern H (1996) Posterior predictive assessment of model fitness via realized discrepancies. Stat Sinica 6:733–807
MathSciNet MATH Google Scholar
Hao C, Chen D (2021) Software/hardware co-design for multi-modal multi-task learning in autonomous systems. In: 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE, pp 1–5
Hossain HMS, Khan MAAH, Roy N (2017) Active learning enabled activity recognition. Pervasive Mob Comput 38:312–330 (IEEE International Conference on Pervasive Computing and Communications (PerCom) 2016)
Google Scholar
Huang Q, Hao K (2020) Development of cnn-based visual recognition air conditioner for smart buildings. J Inf Technol Constr 25:361–373
Google Scholar
Huang Q, Mao C (2017) Occupancy estimation in smart building using hybrid CO₂/light wireless sensor network. J Appl Sci Arts 1(2):5
Google Scholar
Huang Q, Ge Z, Lu C (2016) Occupancy estimation in smart buildings using audio-processing techniques. In: International conference on computing in civil and building engineering. Osaka, Japan, pp 1413–1420
Google Scholar
Huang Q, Rodriguez K, Whetstone N, Habel S (2019) Rapid internet of things (iot) prototype for accurate people counting towards energy efficient buildings. J Inf Technol Constr 24:1–13
Google Scholar
Jaouhari SE, Bouabdallah A, Corici AA (2021) Sdn-based security management of multiple wot smart spaces. J Ambient Intell Humaniz Comput 12(10):9081–9096
Google Scholar
Li M, Zhou P, Liu Y, Wang H (2020a) Data-driven predictive probability density function control of fiber length stochastic distribution shaping in refining process. IEEE Trans Autom Sci Eng 17(2):633–645
Google Scholar
Li T, Chien Y, Chou C, Liao C, Cheah W, Fu L, Chen CC, Chou C, Chen I (2020b) A fast and low-cost repetitive movement pattern indicator for massive dementia screening. IEEE Trans Autom Sci Eng 17(2):771–783
Google Scholar
Ma Z, Leijon A (2011) Approximating the predictive distribution of the beta distribution with the local variational method. In: 2011 IEEE International Workshop on Machine Learning for Signal Processing, pp 1–6
Ma Z, Leijon A, Tan ZH, Gao S (2014) Predictive distribution of the dirichlet mixture model by local variational inference. J Signal Process Syst 74(3):359–374
Google Scholar
Manouchehri N, Dalhoumi O, Amayri M, Bouguila N (2020) Variational learning of a shifted scaled dirichlet model with component splitting approach. In: Third International Conference on Artificial Intelligence for Industries, AI4I 2020, Irvine, CA, USA, September 21-23, 2020, IEEE, pp 75–78
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28(1):92–122
MathSciNet MATH Google Scholar
Naser A, Lotfi A, Zhong J (2020) Adaptive thermal sensor array placement for human segmentation and occupancy estimation. IEEE Sens J 21(2):1993–2002
Google Scholar
Nasfi R, Amayri M, Bouguila N (2020) A novel approach for modeling positive vectors with inverted dirichlet-based hidden markov models. Knowl-Based Syst 192:105335
Google Scholar
Nguyen TA, Aiello M (2013) Energy intelligent buildings based on user activity: a survey. Energy Build 56:244–257
Google Scholar
Oldewurtel F, Sturzenegger D, Morari M (2013) Importance of occupancy information for building climate control. Appl Energy 101:521–532
Google Scholar
Rabie AH, Saleh AI, Ali HA (2021) Smart electrical grids based on cloud, IOT, and big data technologies: state of the art. J Ambient Intell Humaniz Comput 12(10):9449–9480
Google Scholar
Sefidpour A, Bouguila N (2012) Spatial color image segmentation based on finite non-gaussian mixture models. Expert Syst Appl 39(10):8993–9001
Google Scholar
Shao W, Ge Z, Yao L, Song Z (2020) Bayesian nonlinear gaussian mixture regression and its application to virtual sensing for multimode industrial processes. IEEE Trans Autom Sci Eng 17(2):871–885
Google Scholar
Siirtola P, Röning J (2021) Context-aware incremental learning-based method for personalized human activity recognition. J Ambient Intell Humaniz Comput 12(12):10499–10513
Google Scholar
Sinharay S, Stern HS (2003) Posterior predictive model checking in hierarchical models. J Stat Plann Inference 111(1):209–221
MathSciNet MATH Google Scholar
Snelson E, Ghahramani Z (2005) Compact approximations to Bayesian predictive distributions. In: Proceedings of the 22nd International Conference on Machine Learning, Association for Computing Machinery, New York, NY, USA, pp 840–847
Tiao GG, Cuttman I (1965) The inverted dirichlet distribution with applications. J Am Stat Assoc 60(311):793–805
MathSciNet MATH Google Scholar
Tirdad P, Bouguila N, Ziou D (2015) Variational learning of finite inverted dirichlet mixture models and applications. In: Laalaoui Y, Bouguila N (eds) Artificial intelligence applications in information and communication technologies, studies in computational intelligence, vol 607. Springer, pp 119–145
Google Scholar
Viard K, Fanti MP, Faraut G, Lesage JJ (2020) Human activity discovery and recognition using probabilistic finite-state automata. IEEE Trans Autom Sci Eng 17:2085–2096
Google Scholar
Wang J, Zhao C (2020) A gaussian feature analytics-based dissim method for fine-grained non-gaussian process monitoring. IEEE Trans Autom Sci Eng 17:2175–2181
Google Scholar
Yan Y, Luh PB, Pattipati KR (2020) Fault prognosis of key components in hvac air-handling systems at component and system levels. IEEE Trans Autom Sci Eng 17:2145–2153
Google Scholar
Yang Y, Hu G, Spanos CJ (2020) Hvac energy cost optimization for a multizone building via a decentralized approach. IEEE Trans Autom Sci Eng 17:1950–1960
Google Scholar
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Third IEEE International Conference on Data Mining, pp 435–442
Zheng J, Lu C, Hao C, Chen D, Guo D (2020) Improving the generalization ability of deep neural networks for cross-domain visual recognition. IEEE Trans Cognit Dev Syst 13(3):607–620
Google Scholar

Download references

Acknowledgements

The completion of this research was made possible thanks to Natural Sciences and Engineering Research Council of Canada (NSERC), the “Nouveaux arrivants Université Grenoble Alpes Grenoble INP - UGA Â$\gg $/G-SCOP” program and the National Natural Science Foundation of China (61876068). The authors would like to thank the associate editor and reviewers for their helpful comments.

Author information

Authors and Affiliations

Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, H3G1T7, Canada
Jiaxun Guo, Fatma Najar & Nizar Bouguila
Grenoble Institute of Technology, Grenoble, France
Manar Amayri
The Department of Computer Science and Technology, Huaqiao University, Xiamen, China
Wentao Fan

Authors

Jiaxun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Manar Amayri
View author publications
You can also search for this author in PubMed Google Scholar
Fatma Najar
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nizar Bouguila.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of equation 11

The logarithm of the Multivariate-Inverse-Beta has been proved to be concave (Ma et al. 2014). Thus, the following inequality can be easily obtained by first order Taylor expansion

$$\begin{aligned} \begin{aligned} \ln \frac{\varGamma (\sum _{d=1}^{D+1}\alpha _{d})}{\prod _{d=1}^{D+1}\varGamma (\alpha _{d})}&\le \ln \frac{\varGamma (\sum _{d=1}^{D+1}\tilde{\alpha }_{d})}{\prod _{d=1}^{D+1}\varGamma (\tilde{\alpha }_{d})}\\&+\sum _{d=1}^{D+1}\left[ \psi \left( \sum _{d=1}^{D+1}\tilde{\alpha }_{d}\right) -\psi (\tilde{\alpha }_{d})\right] (\alpha _{d}-\tilde{\alpha }_{d}) \end{aligned} \end{aligned}$$

(23)

where $\tilde{\alpha }_{d}, k=1,2,...,D+1$ is any point from the posterior distribution. Taking the exponential of both sides, we have

$$\begin{aligned} \begin{aligned} \frac{\varGamma (\sum _{d=1}^{D+1}\alpha _{d})}{\prod _{d=1}^{D+1}\varGamma (\alpha _{d})}&\le \frac{\varGamma (\sum _{d=1}^{D+1}\tilde{\alpha }_{d})}{\prod _{d=1}^{D+1}\varGamma (\tilde{\alpha }_{d})}\\&\times e^{\sum _{d=1}^{D+1}\left[ \psi \left( \sum _{d=1}^{D+1}\tilde{\alpha }_{d}\right) -\psi (\tilde{\alpha }_{d})\right] (\alpha _{d}-\tilde{\alpha }_{d})} \end{aligned} \end{aligned}$$

(24)

By substituting (24) into (7) and with some mathematical manipulations, we can obtain the following upper-bound

$$\begin{aligned} \begin{aligned} f({\textbf {x}}|{\textbf {X}})&\le \int \cdots \int \frac{\varGamma (\sum _{d=1}^{D+1}\tilde{\alpha }_{d})}{\prod _{d=1}^{D+1}\varGamma (\tilde{\alpha }_{d})}\\&\quad \times e^{\sum _{d=1}^{D+1}\left[ \psi \left( \sum _{d=1}^{D+1}\tilde{\alpha }_{d}\right) -\psi (\tilde{\alpha }_{d})\right] (\alpha _{d}-\tilde{\alpha }_{d})}\\&\quad \times x_{1}^{\alpha _{1}-1}\frac{(v_{1}^{*})^{u_{1}^{*}}}{\varGamma (u_{1}^{*})}\alpha _{1}^{u_{1}^{*}-1}e^{-v_{1}^{*}\alpha _{1}}(1+\sum _{d=1}^{D}x_{d})^{-\alpha _{1}}\\&\quad \cdots \\&\quad \times x_{D+1}^{\alpha _{D+1}-1}\frac{(v_{D+1}^{*})^{u_{D+1}^{*}}}{\varGamma (u_{D+1}^{*})}\alpha _{D+1}^{u_{D+1}^{*}-1}\\&\quad \quad \quad e^{-v_{D+1}^{*}\alpha _{D+1}}(1+\sum _{d=1}^{D}x_{k})^{-\alpha _{D+1}}d\alpha _{1}...d\alpha _{D+1}\\&= \frac{\varGamma (\sum _{d=1}^{D+1}\tilde{\alpha }_{d})}{\prod _{d=1}^{D+1}\varGamma (\tilde{\alpha }_{d})} \times e^{-\sum _{d=1}^{D+1}\tilde{\alpha }_{d}\left[ \psi \left( \sum _{d=1}^{D+1}\tilde{\alpha }_{d}\right) -\psi (\tilde{\alpha }_{d})\right] }\\&\quad \times \prod _{d=1}^{D+1}\frac{(v_{d}^{*})^{u_{d}^{*}}}{x_{d}\varGamma (u_{d}^{*})}\\&\quad \int e^{-\alpha _{d}\left[ v_{d}^{*}-\ln x_{d} - \psi \left( \sum _{d=1}^{D+1} \tilde{\alpha }_{d} \right) + \psi (\tilde{\alpha }_{d}) + \ln \left( 1+\sum _{d=1}^{D} x_{d} \right) \right] }\\&\quad \times \alpha _{d}^{u_{d}^{*}-1} du_{d}\\&\approx f_{upp}({\textbf {x}}|{\textbf {X}}) \end{aligned} \end{aligned}$$

(25)

For simplicity let’s denote

$$\begin{aligned} {\textbf {G}}(x_{d},\tilde{\varvec{\alpha }})= & {} v_{d}^{*}-\ln x_{d} - \psi \left( \sum _{d=1}^{D+1} \tilde{\alpha }_{d} \right) \nonumber \\&\quad + \psi (\tilde{\alpha }_{d}) + \ln \left( 1+\sum _{d=1}^{D} x_{d} \right) \end{aligned}$$

(26)

where $d = 1,2,...,D+1$. Thus, the integration in Eq. 25 has a same form as Gamma function and could be reduced to

$$\begin{aligned} \int e^{-\alpha _{d}{} {\textbf {G}}(x_{d},\tilde{\varvec{\alpha }})}\alpha _{d}^{u_{d}^{*}-1} du_{d} = \left\{ \begin{aligned} \frac{\varGamma (u_{d}^{*})}{\left[ {\textbf {G}}(x_{d},\tilde{\varvec{\alpha }})\right] ^{u_{d}^{*}}}&\quad {\textbf {G}}(x_{d},\tilde{\varvec{\alpha }}) > 0\\ \infty&\quad {\textbf {G}}(x_{d},\tilde{\varvec{\alpha }}) \le 0 \end{aligned} \right. \end{aligned}$$

(27)

Here, we attempt ${\textbf {G}}(x_{d},\tilde{\varvec{\alpha }}) > 0$ for any d because the situation of ${\textbf {G}}(x_{d},\tilde{\varvec{\alpha }}) \le 0$ is unsolvable. Finally, the analytically tractable form of finite upper-bound of the predictive distribution is

$$\begin{aligned} \begin{aligned} f_{upp}({\textbf {x}}|{\textbf {X}}) =&\frac{\varGamma (\sum _{d=1}^{D+1}\tilde{\alpha }_{d})}{\prod _{d=1}^{D+1}\varGamma (\tilde{\alpha }_{d})} \times e^{-\sum _{d=1}^{D+1}\tilde{\alpha }_{d}\left[ \psi \left( \sum _{d=1}^{D+1}\tilde{\alpha }_{d}\right) -\psi (\tilde{\alpha }_{d})\right] }\\&\times \prod _{d=1}^{D+1}\frac{(v_{d}^{*})^{u_{d}^{*}}}{x_{d}\left[ {\textbf {G}}(x_{d},\tilde{\varvec{\alpha }})\right] ^{u_{d}^{*}}} \end{aligned} \end{aligned}$$

(28)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, J., Amayri, M., Najar, F. et al. Occupancy estimation in smart buildings using predictive modeling in imbalanced domains. J Ambient Intell Human Comput 14, 10917–10929 (2023). https://doi.org/10.1007/s12652-022-04359-x

Download citation

Received: 09 February 2022
Accepted: 28 July 2022
Published: 23 August 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s12652-022-04359-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A scaled dirichlet-based predictive model for occupancy estimation in smart buildings

Nonparametric user activity modelling and prediction

Exploring zero-training algorithms for occupancy detection based on smart meter measurements

Data availibility

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Proof of equation 11

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Occupancy estimation in smart buildings using predictive modeling in imbalanced domains

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A scaled dirichlet-based predictive model for occupancy estimation in smart buildings

Nonparametric user activity modelling and prediction

Exploring zero-training algorithms for occupancy detection based on smart meter measurements

Explore related subjects

Data availibility

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Proof of equation 11

Appendix: Proof of equation 11

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation