Two different experimental setups were designed to compare the performances of late fusion methods with the early fusion of all available sensors. In the first experimental setup, classifiers were applied to each individual sensor to assess their performance in detecting the presence of occupants in a room. The six most well-performing ones are reported here. Following this, the results of these classifiers were combined using various late fusion methods. The combined sensor reading methods utilized early fusion by concatenating the available sensors and afterwards applying two ensemble classifiers for occupancy detection. The performances of the two experimental setups were compared and the results revealed a higher performance of the ensemble classifiers applied on the concatenated sensor readings.
3.1. First Experimental Setup
The following described process was applied for each room. The number of the sensor is , and since the label is Boolean, the classification problem consists of categories (Occupied, Not Occupied). For the recognition of occupancy based on the information from each sensor individually, several classifiers were tested, with the results of the six that performed better reported here, namely SVM, kNN, RF, GB, GNB, and DT. For all of the classifiers, the hyperparameters were set to the default ones that are used in the open source machine learning library Scikit-learn.
After training all types of base learners on each individual sensor, the trained models were used on the test set and the results for the balanced accuracy are shown in
Figure 3. From the results, it is clear that the classification based on each individual sensor does not perform well. The best performing sensors are the Dust and Sound level ones, since in every room they achieve the highest balanced accuracy values compared to the rest. The difference is evident for all the rooms and fluctuates between 7% and 20%. These values appear in Room E and Room D, respectively. These results were expected, as Room D has the most balanced data, while Room E has the most unbalanced data. Regarding the performance of classifiers, the GB and SVM demonstrate superior performance, with the exception of Room E, where the GNB achieves the highest balanced accuracy. However, the differences with the other classifiers are mostly insignificant in the majority of cases. In general, the results suggest that the balanced accuracy is more influenced by the sensor type and data quality rather than the choice of classifier.
Following the application of various classifiers on individual sensors, the results of all available sensors per room were combined, using multiple late fusion techniques. Initially, the performance of standard late fusion methods was tested, such as averaging, major voting, weighted average, and stacking. The general guidelines for each method is described in
Section 2.
For the
weighted average method, the weights can be assigned by carefully examining the effect each sensor has on the final result or by using an optimization method to produce these weights. The criterion for the optimization methods is the balanced accuracy. For this implementation, three cases were considered: (a) Hand-picked weights [0.4, 0, 0.1, 0.4, 0.1] (Weighted Average 1), (b) Hand-picked weights [0.5, 0, 0, 0.5, 0] (Weighted Average 2), and (c) Bayesian-optimized weights (Weighted Average 3). The previously mentioned weights correspond to this sequence of sensors: Sound level, Temperature, Relative humidity, Dust, and Pressure. The hand-picked weights were chosen after carefully examining the results of each subplot of
Figure 3, where in the majority of the results, the Sound level and the Dust sensors gave the best results, with a significant difference from the rest. This justifies the selection of hand-picked weights in the second set where only these two sensors were given a value and the other three were not considered at all in the final decision. The Relative Humidity and Pressure sensors are the next best in terms of balanced accuracy; thus, they were given an increased weight value in the first set of hand-picked weights. The same set of values were applied to all the rooms. However, the weights could be different for each room and adjusted each time according to the performance of each sensor.
For the
stacking method, the predicted probabilities of the train set for all the sensors are taken into consideration to build a new model. This model is used for making predictions on the test set. To implement the stacking method, three different algorithms were used as the new model that has to be trained: (a) AdaBoosting, (b) Bagging, and (c) XtraTree. AdaBoosting, short for Adaptive Boosting, is a powerful ensemble learning method that combines multiple weak classifiers to create a strong classifier [
31]. Bagging, short for Bootstrap Aggregating, is another ensemble learning technique that combines the predictions of multiple independent classifiers trained on different subsets of the training data [
32].
The results from the implementation of the above-mentioned fusion methods for all the rooms are presented in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5. At the end of each table, the best score for the balanced accuracy from the use of individual sensors is reported as well as the maximum increase that the fusion methods achieve, compared to the best performing individual sensor. Averaging and Major voting give far worse results than the best performance from the individual sensors. The best late fusion methods seem to be the more advanced ones, which are AdaBoosting, Bagging, and XtraTree.
In the weighted average method, the optimization seems to improve the results for the base learners SVM, kNN, and GB, but not for the RF, GNB, and DT. The problem is that during the optimization process, the weights seem to overfit the training data (train accuracy for these models ∼99%). Trying to randomly split the train set during each optimization epoch did not seem to improve the results.
The results revealed that the above-mentioned late fusion methods do not improve the performance for the base models RF and DT. For all the rooms with unbalanced data, the best score from a late fusion method is worse than the best score from the individual sensors. For Room D, although there is improvement for all base learners, the RF and DT are the ones with the lowest increase in the balanced accuracy.
In the following, the implementation of more advanced weighted fusion methods is described. As it is explained in
Section 2 of this paper, for the weighted late fusion framework, each sensor is assigned multiple weight values (
5), not just one, equal to the number of the classes. This is why in our case, two weights correspond to each sensor. Since the two classes are imbalanced, these values are based on the detection rate of each class. The detection rate is computed during the training stage of each sensor. The detection rate of the “No Occupancy” class (
) (
14) is equal to the ratio of the true negatives (
) to all the predicted values, and the detection rate of the “Occupancy” class (
) (
15) is equal to the ratio of the true positives (
) to all the predicted values.
To assist the recognition of classes not so easily detected, the weights were set as equal to the supplementary of each detection rate (
16), (
17), as proposed in [
16].
Another approach for the computation of the weights, instead of relying on the detection rate, is to use an optimization method. The algorithm used for the optimization is the Bayesian optimization. The results from these two frameworks are presented with the results of the next section. The weighted late fusion method with the detection rate-based weights will be referenced as Weighted Late Fusion 1, and the one with the optimized weights as Weighted Late Fusion 2.
For the class-based weighted late fusion framework, the adaptation parameter
is assigned values ranging from 0 to 1. For the present experiments, two cases were examined (a) using a fixed adaptation parameter
= 0.25 (Class-based Late Fusion 1), and (b) using an optimized adaptation parameter
for each sensor (Class-based Late Fusion 2). The algorithm used for the optimization is the Bayesian optimization. The results from weighted late fusion frameworks and from the class-based weighted late fusion framework for all the rooms are given in
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10. At the end of each table, the best score for the balanced accuracy from the use of individual sensors is reported as well as the maximum increase that the weighted late fusion frameworks achieve.
The implementation of the weighted late fusion frameworks achieved the best results for the occupancy detection in a room, compared to the previously mentioned fusion methods. For all the base learners, the balanced accuracy increases and, for some cases, it reaches values higher than 80%. Specifically, these results were achieved with the use of GB classifiers as basic models for the late fusion. The only exception to the increase in the balanced accuracy occurs in Room D, which is the one with the most balanced data. The implementation of the weighted late fusion framework using RF and DT classifiers as base learners seems to have negative results towards the balanced accuracy.
3.2. Second Experimental Setup
These predictions were based on concatenated sensor readings from the five sensors available in each room, a method that diverges from the first experimental setup where classifiers were positioned on each sensor, followed by subsequent fusion. This approach was also adopted during the Ashvin project, serving as a foundation for occupancy predictions.
The unique characteristics of the chosen models, which are both ensemble methods, negated the need for data scaling. A combination of grid search and cross validation was leveraged to refine the model’s hyperparameters, thereby ensuring an optimal training outcome. The models’ performance was assessed using balanced test accuracy, a metric that compensates for any imbalances in class distribution. Upon evaluation, as illustrated in
Table 11, it was discovered that the XtraTree model held an advantage over the LGBM model in predicting room occupancy, with an average increase in accuracy ranging between 1% and 2% per room.
In summary, the analysis indicates a certain superiority of the XtraTree algorithm over the LGBM algorithm in the context of predicting room occupancy, utilizing various sensor inputs. In comparison to the results of the first experimental setup, although the increase for the balanced accuracy for the basic late fusion methods is significant compared to the individual sensor classification, it does not exceed the performance of XtraTree and LGBM for the combined sensor readings. The only exception to this is for Room D (which is the one with the most balanced data), where with the use of GNB as the base model and XtraTree as the fusion method, the balanced accuracy reaches a value of 85.94%, whereas the values for combined sensor readings with XtraTree and LGBM are 82.9% and 81.8%, respectively.
The results are similar when it comes to the more advanced late fusion frameworks, where the balanced accuracy for the weighted late fusion and the class-based weighted late fusion does not exceed the performance of XtraTree and LGBM for the combined sensor readings. The only exception to this is for Room A, where with the use of GB as base models and the Weighted late fusion 2 as the fusion method, the balanced accuracy reaches a value of 81.27%, whereas the values for combined sensor readings with XtraTree and LGBM are 80.3% and 79.2%, respectively.