Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
A Fuzzy-Immune-Regulated Single-Neuron Proportional–Integral–Derivative Control System for Robust Trajectory Tracking in a Lawn-Mowing Robot
Previous Article in Journal
Exploring Park and Ride: A Spatial Analysis of Transit Catchment in Outer Melbourne
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DiscHAR: A Discrete Approach to Enhance Human Activity Recognition in Cyber Physical Systems: Smart Homes

by
Ishrat Fatima
1,
Asma Ahmad Farhan
1,
Maria Tamoor
2,
Shafiq ur Rehman
3,*,
Hisham Abdulrahman Alhulayyil
3 and
Fawaz Tariq
4,*
1
Department of Computer Science, National University of Computer and Emerging Sciences, Lahore 54000, Pakistan
2
Department of Computer Science, Forman Christian College, Lahore 54000, Pakistan
3
College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 13318, Saudi Arabia
4
Department of Geodesy and Geoinformation Science, Technical University of Berlin, 10623 Berlin, Germany
*
Authors to whom correspondence should be addressed.
Computers 2024, 13(11), 300; https://doi.org/10.3390/computers13110300
Submission received: 13 October 2024 / Revised: 4 November 2024 / Accepted: 14 November 2024 / Published: 19 November 2024
Figure 1
<p>High-level architecture of DiscHAR.</p> ">
Figure 2
<p>Elbow method to determine clusters within each activity class, where the x axis shows the number of clusters and the y axis represents distortion.</p> ">
Figure 3
<p>Detailed overview of the CNN model used in DiscHAR.</p> ">
Figure 4
<p>F1 score for the OPP79 [<a href="#B34-computers-13-00300" class="html-bibr">34</a>] dataset, where the x axis shows epochs and the y axis represents the F1 score.</p> ">
Figure 5
<p>Loss curve for the OPP79 [<a href="#B34-computers-13-00300" class="html-bibr">34</a>] dataset, where the x axis shows epochs and the y axis represents the loss curve.</p> ">
Figure 6
<p>F1 score for the PAMAP2 [<a href="#B30-computers-13-00300" class="html-bibr">30</a>] dataset, where the x axis shows epochs and the y axis represents the F1 score.</p> ">
Figure 7
<p>Loss curve for the PAMAP2 [<a href="#B30-computers-13-00300" class="html-bibr">30</a>] dataset, where the x axis shows epochs and the y axis represents the loss curve.</p> ">
Figure 8
<p>Accuracy for the WISDM [<a href="#B38-computers-13-00300" class="html-bibr">38</a>] dataset, where the x axis shows epochs and the y axis represents the accuracy for different learning rates.</p> ">
Figure 9
<p>Loss curve for the WISDM [<a href="#B38-computers-13-00300" class="html-bibr">38</a>] dataset, where the x axis shows epochs and the y axis represents the loss curve for different learning rates.</p> ">
Versions Notes

Abstract

:
The main challenges in smart home systems and cyber-physical systems come from not having enough data and unclear interpretation; thus, there is still a lot to be done in this field. In this work, we propose a practical approach called Discrete Human Activity Recognition (DiscHAR) based on prior research to enhance Human Activity Recognition (HAR). Our goal is to generate diverse data to build better models for activity classification. To tackle overfitting, which often occurs with small datasets, we generate data and convert them into discrete forms, improving classification accuracy. Our methodology includes advanced techniques like the R-Frame method for sampling and the Mixed-up approach for data generation. We apply K-means vector quantization to categorize the data, and through the elbow method, we determine the optimal number of clusters. The discrete sequences are converted into one-hot encoded vectors and fed into a CNN model to ensure precise recognition of human activities. Evaluations on the OPP79, PAMAP2, and WISDM datasets show that our approach outperforms existing models, achieving 89% accuracy for OPP79, 93.24% for PAMAP2, and 100% for WISDM. These results demonstrate the model’s effectiveness in identifying complex activities captured by wearable devices. Our work combines theory and practice to address ongoing challenges in this field, aiming to improve the reliability and performance of activity recognition systems in dynamic environments.

1. Introduction

Human Activity Recognition (HAR) is an active research area, with applications ranging from healthcare analysis [1] to the automation of intelligent home systems [2]. Despite its potential, automated HAR still faces challenges like overfitting, which mainly arises due to limited data and the inability of traditional processes to capture the finer details of human activity through sensors. This can lead to situations where the HAR system does not correctly identify human actions, impacting its performance in critical areas like health checks or home care. Solving these issues is important not only to improve the accuracy of HAR but also to unlock its full potential in different application areas. In cyber-physical systems, accurate HAR is critical for ensuring smooth interaction between physical components and computational elements. As these systems grow in complexity, HAR models must reliably process and interpret sensor data to maintain safety, efficiency, and user satisfaction in real-time environments. In healthcare, for example, accurately identifying activities can lead to early intervention, potentially saving lives. Similarly, accurate HAR in smart homes can improve energy efficiency and enhance the user experience by automating tasks based on known activities. The importance of solving these problems can be seen from both scientific and practical perspectives. However, overfitting remains a common issue in HAR, and models trained with limited data generally perform poorly in new scenarios [3]. Additionally, reliance on fixed features often hinders the adaptability of HAR systems to various real-world situations. Researchers are exploring new methods, such as non-monotonic representation [4], which could improve HAR accuracy, especially in critical areas like healthcare. Accurate HAR is vital in clinical practice because it enables early detection, improves patient outcomes, and reduces medical costs. Advances in HAR use sensor data to compare state-of-the-art models and highlight current challenges and future research directions in the field.
Moreover, a recent study [5] showed the possibility of combining multiple datasets to improve HAR performance, demonstrating that fusing sensor data with video input can create better information models. Another study by Martinez-Rios and Alvarez [6] highlighted the benefits of using adaptive training in HAR, which protects the pre-trained intensity model from impact effects, thereby mitigating the problem of limited data usage. In addition, the benefits of employing tracking methods in a HAR model for the capture of physical characteristics were explained by Han et al. [7] to enhance recognition accuracy.
Furthermore, with the advent of smart devices, rapid progress in HAR has been observed [8]. Examples include fitness monitors [1] and smartwatches [9], which have embedded accelerometers and gyroscopes that continuously monitor fine-grained motions. This uninterrupted stream of data enables accurate and real-time tracking and recognition of different activities. As an illustration, an experiment showed that wearable sensors can detect subtle abnormalities in human gait, which is essential for early identification of neurological disorders [10]. Recently, a novel approach called ConvBoost was introduced by Shao et al. [3] that enhances the capacity of Convolutional Neural Networks (CNNs) to recognize activities in sensor data by enhancing data to addressed overfitting. This allows for an increased improvement in model accuracy through the employment of improved training methods.
Das et al. [2] argued that one of the emerging focuses in HAR is the interpretability of artificial intelligence, especially for smart homes. Their argument is particularly important given the need for clear decision-making processes within any system that employs AI in the management of patients’ health status, as well as in home automation systems. By making AI more understandable, these systems can earn people’s trust and provide insights that are easier to act on.
In recent years, many techniques, such as data generation [3] and deep learning [11], have been used to solve issues related to HAR. However, these techniques, still have issues with overfitting and with the continuous data streams that fail to predict the correct activity. For example, deep learning models require different types of data and large datasets; on the other hand, methods that enhance data might miss out on capturing the full range of human activities.
In this paper, we propose a novel approach called Discrete Human Activity Recognition (DiscHAR) that addresses these challenges by combining advanced techniques. We use a modeling process called R-Frame [3] and a data enhancement method known as the Mixed-up approach [3,12] to create rich and varied data. After that, we apply K-means vector quantization to turn these data into distinct categories with the aim of improving the model’s accuracy and interpretation.
The limited availability of sensor data and the complexity of human activities pose serious challenges in developing reliable and accurate HAR models. In HAR, problems arise when trying to solve overfitting and function representation issues. However, these challenges also bring opportunities for innovation. By using techniques such as semi-supervised learning and higher-order data representation like vector quantization, we can resolve data gaps and improve model interpretation. Next, we convert these data into simpler, discrete representations using vector quantization via K-means. For the value of k in K-means, we use the elbow method to determine the optimal number of clusters for each class in the dataset. Then, we convert the discrete sequences into one-hot encoded vectors to pass them to the model. Finally, we pass this vector to the CNN model, ensuring precise recognition of human activities. Figure 1 illustrates the entire framework. Details of each step are discussed below.
For example, Haresamudram et al. [13] examined the development of comparative predictive coding and demonstrated the role of self-monitoring in learning representations from valid information. This approach can reduce the dependencies on large datasets and increase the learning of HAR systems. Similarly, Swain et al. [14] explored the use of WiFi network logs to describe student interactions and their relationship to learning, showing the ever-so-different uses of HAR.
Our approach reduces competition and increases data diversity by employing complex data methodologies including an improved data layer (mixed) and structural model (R-Frame) to solve HAR challenges. Next, we convert continuous features into discrete representations using K-means vector quantization, which enhances HAR accuracy and resolves enduring issues in the area. By providing an accurate and useful analysis, we aim to present the efficacy of our planning method and further the development of HAR technology [15].
The integration of HAR with the Internet of Things (IoT) is also a possible area for more research. Relevant information may flow easily from IoT devices, improving HAR systems and generating more intelligent and flexible settings. For example, as shown by [16,17], IoT-enabled HAR systems can dynamically change home automation settings based on real-time activity recognition. This integration highlights the opportunity for changes in complex HAR techniques that can result in more specific and sensitive smart home experiences.
The following are our research contributions achieved in this work:
  • Data generation: We present a novel method that uses the sample window to obtain the sample size, then generate data using the mix-up technique to solve the data limitation in human activity recognition.
  • Discrete representation: The generated continuous data are converted to a discrete form using K-means vector quantization to improve accuracy.
  • Reduced overfitting: Discrete data can help in reducing the risk of overfitting, as the model is less likely to fit to noise and minor fluctuations in the data.
  • Accuracy: In capturing similar movements, our proposed system outperformed existing state-of-the-art methods in terms of accuracy. Specifically, our model achieved 89 % accuracy for OPP79, 93.24 % for PAMAP2, and 100 % for WISDM.
In summary, we are able to utilize this technology in many kinds of applications by resolving common problems like overfitting and feature representation in HAR. Our approach uses data generation and representation techniques and aims to improve the accuracy and usability of HAR for applications such as healthcare, smart homes, etc. It aims to pave the way for new applications in these fields. Through continuous research and analysis, we strive to improve HAR and its impact on quality of life and well-being. By combining new advances and existing methods, as proposed in [2,3], we foresee a future in which HAR systems will be more accurate, defined, and versatile, enhancing the advancement of features of the smart environment and tuning personal healthcare ability.
In the following sections, we first provide a comprehensive literature review (Section 2), where we examine existing research and developments in human activity recognition (HAR) and the application of deep learning techniques. This sets the stage for our proposed methodology (Section 3), which includes data generation, feature extraction, and classification specifically designed for HAR. We then detail our experimental setup (Section 5), including dataset selection and preprocessing steps. The architecture of our convolutional neural network (CNN) model (Section 5.2) is outlined next, describing how it was structured and optimized for HAR tasks. Finally, we present our experimental results in Section 5.3, analyzing the performance and effectiveness of our approach and discussing its contribution to the advancement of HAR technologies.

2. Literature Review

In recent years, various methods have been explored to improve human activity recognition (HAR) systems, particularly in handling challenges like overfitting and underfitting, data limitations, and interpretability issues. We categorize the key literature into several themes, including machine learning HAR methods, deep learning approaches, AI-based approaches, and computer vision techniques.
  • Machine Learning Approaches for HAR
Previous research on human activity recognition (HAR) has explored various ways to deal with the problems of over- and underfitting. Traditional methods have focused on machine learning techniques that use methods such as Support Vector Machine (SVM) and random forest to classify tasks based on traditional features [18]. However, these methods often face limitations in dealing with the complexity and variability of human activities, so more methods are needed.
Previously, it was common for human activity recognition (HAR) to rely heavily on machine learning algorithms. These techniques included experts manually extracting features from sensor data and training classifiers with them, such as Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN), to enable the classification of various human behaviors [19]. Although these methods work well, they also have some disadvantages. They often have difficulty recognizing new or less visible functions and adapting to changes in sensor placement or user behavior. Additionally, because traits are extracted from individuals, these models sometimes fail to capture varying activity patterns. Despite these limitations, the early learning process of a two-level taxonomy model provides an important point for the development of more complex HAR processes and allows us to better understand the work of the same individuals.
In [14], a combination of gradient boosting and linear regression models was used to analyze data collected from 163 students in 54 program groups. This dataset includes surveys and WiFi-based sensing logs. Three models are presented: MPeer, trained on peer evaluation scores from self-reported survey responses; MIndi, focusing on individual features; and MColloc, dedicated to features representing collocation among group members. Despite achieving a high precision score of 0.81 , the models exhibit a sensitivity of 0.75 due to the challenge of accurately determining individuals’ locations. The research identified a gap in WiFi-based sensing for precise location prediction, motivating further investigation to enhance model performance in this context.
Examining HAR, [20] delved into understanding and categorizing human actions using tools like smartphones and AI, underscoring HAR’s growing significance across diverse applications. The review spanned from 2011 to 2021, focusing on devices, AI techniques, and real-world applications. Despite HAR’s notable impact in healthcare, AI’s potential remains in its early stages, warranting more dependable and unbiased models. Additionally, the paper highlights the scarcity of research on abnormal activity detection and human action forecasting.
Considering the complexity of modeling the data generation process that exists in such an environment, investigating human activity recognition (HAR) studies is important, especially in areas of high value in the use of materials. Here, we present a new approach that emphasizes understanding the interactions between different parts and their meanings. We consider a wide range of designs taking care of data collection requirements using sophisticated neural architecture search (NAS) technology. The Sussex-Huawei Motion Dataset, a database that collects sensor data of multiple types, including accelerometer, gyroscope, cellular network, WiFi network, and audio data, was used to assess the performance of the proposed method. This study demonstrates the effectiveness of the design by restricting access to the learning process of the neural network. These findings highlight the potential to create robust and effective learning programs, especially in data sampling, paving the way for the improvement of human performance in sensor-rich environments.
  • Deep Learning-Based Approaches for HAR
In recent years, deep learning has emerged as an important component in HAR, providing the ability to learn representative data from raw data. Convolutional neural networks (CNN) in deep learning are becoming popular due to their better performance in learning spatial patterns from sensor data and temporal patterns, leading to better recognition [21]. This paper [22] reviews machine learning and deep learning developments within the scope of HAR and ends by examining various ways in which CNNs can be used to solve overfitting, as well as feature representation issues, in HAR.
The application of deep learning allows for automatic extraction of hierarchical characteristics straight from raw sensor data, which has improved the area of HAR in recent years. In particular, in terms of collecting spatial and temporal patterns in time-series data, CNNs have advanced to the point where they are well suited for cognitive tasks. CNN architectures such as 1 D - C N N and 2 D - C N N are used for HAR and operate directly on raw sensor data or spectrograms obtained from the sensor readings [23]. CNNs that hierarchically represent specific learning perform better than machine learning models, especially in situations with large and diverse data.
For applications like healthcare, the difficulty of using deep learning to analyze human activities is important [3]. While deep learning shows promising performance, it often faces issues due to the limited availability of training data, which may result in compromised performance. ConvBoost is proposed as a solution for limited training data. ConvBoost generates additional information from multiple dimensions to improve the performance and capability of the ConvNet-based HAR model. This approach has proven that ConvBoost outperforms ConvNet based on F1 scores on many datasets. However, some difficulties remain, especially in differentiating similar activities. This research adds to the work of HAR with potential implications for improving healthcare and behavior analysis [3].
To improve the ability of wearable devices to recognize human activities, Haresamudram et al. [4] explored the use of “uniform representation” to transform continuous data. This new approach promises to expand the capabilities of these devices and improve their overall performance. This approach involves a two-step process, i.e., first, training the model using public data with complex methods and, secondly, fine tuning it for specific targets. Evaluation across multiple databases shows significant improvements over traditional methods. However, challenges in differentiating similar activities remains as an open research question.
In a work proposed by Shan et al. [24], two variants of LSTM, namely a delay model and transition model, were introduced as new training strategies to solve the challenge of recognizing sporadic, non-periodic activities within a background of irrelevant activities. The delay model incorporates predefined delay intervals to add contextual depth, enhancing LSTM’s ability to detect and recognize these sporadic activities over time. The transition model, on the other hand, captures the subtle transitions between different activities, which helps in recognizing sporadic behaviors emerging from continuous actions. These models integrate both continuous data and event-related information, making them robust for the identification of critical patterns. By evaluating their approach on publicly available datasets, the authors showed promising results in detecting activities related to accident-prone scenarios, underscoring the practical utility of these advanced LSTM training methods in real-world applications.
  • AI-Based Approaches for HAR
The authors of [25] introduced a novel approach to enhance HAR using a combination of real and virtual data. Leveraging ChatGPT, a sophisticated text generator, the authors generated activity descriptions that were subsequently transformed into virtual data simulating human movements and interactions. This methodology significantly reduces the resource-intensive process of collecting real-world data, offering cost and time savings. Evaluations conducted on three widely used HAR datasets (RealWorld, PAMAP2, and USC-HAD) demonstrated notable improvements in HAR model performance. By integrating virtual data, the proposed method enhances model accuracy by 1.7 % to 7.7 % compared to using real data alone, showcasing the potential to enhance HAR systems efficiently.
Addressing the challenge of recognizing human activities using sensor data in scenarios with limited labeled training data, a study conducted by Plötz [5] focused on three key areas: representation learning, self-supervised methods, and cross-modality transfer. The goal was to enhance human activity recognition systems’ performance by making them smarter and more adaptable.
After exploring traditional machine learning techniques and advancements in deep learning, particularly CNNs, we discuss LLM-based approaches [26] because they offer innovative solutions for data augmentation and the handling of small datasets, which are common challenges in HAR. By leveraging LLMs, such as ChatGPT, we can synthesize data to augment our training sets, thereby addressing issues of data scarcity and overfitting. This integration not only helps us enhance task performance but also ensures privacy by fine tuning local models without exposing sensitive data, making it a practical approach for HAR applications.
Proposing enhancements to the Contrastive Predictive Coding (CPC) framework for self-supervised learning within sensor-based human activity recognition (HAR), the study discussed in [13] introduced advancements in encoder architecture, autoregressive networks, and future prediction tasks. Through extensive experimentation conducted across diverse datasets and sensor positions, the research highlighted the efficacy of the enhanced CPC framework compared to its predecessor. The findings underscore the potential of this approach in practical HAR applications, particularly in scenarios where annotated data are scarce or difficult to obtain. Emphasizing the benefits of the CPC framework, this paper underscores its ability to learn meaningful representations from abundant unlabeled sensor data.
The Textless Translatotron model provides valuable insights into quantization and the representation of data in discrete forms, which is relevant for our work. As an end-to-end speech-to-speech translation (S2ST) model, Textless Translatotron operates without relying on textual supervision by predicting discrete representations of target speech via a VQ-VAE quantizer, bypassing intermediate text and phoneme dependencies [27]. This approach not only demonstrates competitive translation quality on datasets like CVSS-C and Fisher Spanish–English, but it also shows how quantization can be effectively employed to transform continuous speech data into discrete units for high-quality language translation. Such techniques are used in our own work, providing a framework for the representation of continuous data in discrete form using vector quantization.
  • Computer Vision-based Approaches
In addition to traditional machine learning and deep learning techniques, we discuss vision-based approaches due to their ability to capture rich spatial information that is crucial for recognizing complex human activities. Vision-based methods [28], enhanced by techniques like Finite Discrete Tokens (FDT) for cross-modal alignment, address the granularity challenges between visual and textual data, leading to improved accuracy and performance. Incorporating vision-based approaches in HAR provides a comprehensive understanding of human activities, leveraging detailed visual context that complements sensor data, thereby enhancing the overall effectiveness and robustness of HAR systems.
The LayoutDM model, designed for the crafting of structured layouts with specific constraints [29], employs a discrete diffusion approach to iteratively refine layouts while considering structured data. Notably, it incorporates constraints during inference for conditional generation. LayoutDM showcases superior performance across diverse layout tasks and datasets, outpacing alternative methods and demonstrating its effectiveness. Key advantages of the model include its ability to address shortcomings in existing approaches and accommodate variable-length elements. However, the paper acknowledges the potential misuse of automatically generated content. Nonetheless, LayoutDM emerges as a successful tool for producing controlled layouts with wide-ranging applications.
There have been a number of advantages associated with the use of computer-based training in comparison to human-based training. Still, there are some problems that require more detailed study concerning human activity recognition (HAR). The HAR model tends to overfit as a result of the limited availability of training data. Moreover, a lack of more definite information other than continuous features in this field inhibits the development of models that are more accurate and interpretable. Furthermore, there remains a research gap in effectively discriminating highly similar movements in HAR applications, which is crucial for tasks requiring precision in activity recognition.
In light of these gaps, the following research questions emerge:
  • What approaches can be employed to tackle the challenge of overfitting in human activity recognition (HAR)?
  • What potential benefits does the adoption of discrete sensor data representations offer in HAR?
  • What advanced analytical tools can be applied to analyze symbolic sequences derived from discretized sensor data in HAR?
  • How do discretized data simplify the analysis of complex movements in HAR?
These research questions guide our exploration of innovative methodologies to address the challenges in HAR, focusing on mitigating overfitting, enhancing data representations, and improving the accuracy of activity recognition systems.
In the following sections, we outline our proposed methodology (Section 3), experimental setup (Section 5), and CNN model architecture (Section 5.2) and present experimental results (Section 5.3). Our methodology encompasses data generation, feature extraction, and classification tailored for HAR. We detail dataset selection, preprocessing steps, and the model architecture. Subsequently, we present details of the experimental setup and the CNN model architecture and analyze results to contribute to HAR advancement.

3. Proposed Methodology

In our proposed methodology, we start by collecting sensor data on human activities like walking, sitting, and standing. We then enhance this dataset by using techniques like sampling ( R - F r a m e ) and augmentation ( M i x e d - u p ) . Next, we convert these data into simpler, discrete representations using vector quantization via K-means. For the value of k in K-means, we use the elbow method to determine the optimal number of clusters for each class in the dataset. Then, we convert the discrete sequences into one-hot encoded vectors to pass them to the model. Finally, we pass this vector to the CNN model, ensuring precise recognition of human activities. Figure 1 illustrates the entire framework. Details of each step are discussed below.

3.1. Dataset

In our research, we use diverse sensor datasets from UCI repositories [30,31,32] to capture a spectrum of physical activities under controlled conditions. These datasets include information from multiple sensors, like accelerometers, gyroscopes, and magnetometers, that capture a wide range of activities performed by different people. Using these rich sensor data is essential for our study, as it helps us assess and improve how well algorithms can recognize human activities. Through our research, we aim to advance cognitive processing studies by enhancing the accuracy and reliability of these algorithms in different scenarios.

3.2. Data Preprocessing

During the data preprocessing phase, we take various steps to ensure the quality and consistency of the dataset. First, we remove null or nonexistent entries from the data to eliminate inconsistencies in the data. Null values may affect the analysis and interpretation of data, so removing them helps to ensure a better model. We also identify and remove duplicate data elements to reduce noise and improve the efficiency of the dataset for analysis. By removing null values and duplicates, we improve the quality and reliability of the dataset, making it ready for further analysis and modeling.
To address sensor limitations and irregular data sampling in HAR systems, several techniques are effective in enhancing a HAR system’s robustness in smart home environments. For instance, data augmentation and sensor fusion can enhance sensor data by creating variations and combining information from multiple sources. In cases of irregular sampling, LSTM models and Temporal Convolutional Networks (TCNs) handle sequences with varying gaps, while imputation techniques fill in the missing data. Hybrid CNN–LSTM models capture both spatial and temporal patterns, improving accuracy for complex activities. Emerging models like transformers are also promising, as they manage sequential dependencies without requiring evenly spaced data.

3.3. Feature Extraction

For human activity recognition using the WISDM [33], PAMAP2 [30], and OPPORTUNITY [34] datasets, we customize feature extraction according to the characteristics of each dataset. WISDM provides accelerometer data during activities such as walking and running. PAMAP2 includes accelerometer, gyroscope, and magnetometer readings to provide insight into daily activity. The OPPORTUNITY dataset enriches our functionality with various types of sensor data, including accelerometer, gyroscope, magnetometer, and ambient sensor readings. Using the unique properties of these data, we aim to create accurate models to recognize human activities in different situations.
  • Accelerometer Readings:
Accelerometer readings represent the acceleration experienced by the subject. In human activity recognition, the magnitude and patterns of acceleration can indicate different activities. Walking, for example, creates rapid patterns with characteristic frequencies.
A = a x 2 + a y 2 + a z 2
  • Gyroscope Readings:
A gyroscope reading measures the rotational speed of an object. Rotational movements such as turns or changes in orientation can indicates specific activities.
G = g x 2 + g y 2 + g z 2
where g x , g y , and g z are the angular velocities along the X, Y, and Z axes, respectively.
  • Magnetometer Readings:
Magnetometer readings measure the strength and direction of the magnetic field around subjects, aiding in orientation detection. They do not provide direct information about location.
M = m x 2 + m y 2 + m z 2
where m x , m y , and m z are the magnetic field strengths along the X, Y, and Z axes, respectively.
These sensors provide a comprehensive view of a person’s movement and orientation in three-dimensional space, enabling the recognition of various activities and gestures.

3.4. Data Generation:

We use data generation techniques to create diverse and comprehensive datasets for HAR. These techniques comprise two primary processes:
  • Sampling Layer (R-Frame):
Within the sampling layer, we utilize a new method known as the Random Framing (R-Frame) booster. This method is superior to typical sliding window techniques. Rather than obtaining data in single slices at a time, R-Frame samples multiple slices during an entire period—the latter also being referred to as an “epoch”.
In every epoch, n data frames are captured that are closely adjacent to one another. Each frame, which is referred to as S i , is characterized by a brief sequence of feature attributes and commences from a point x i , including the next L points, where L is the length of the frame. This dense sampling helps us get a better and more detailed picture of the data over time.
S i = [ x i , x i + 1 , , x i + L 1 ]
By performing denser sampling, the model builds a finer level understanding of activities and provides a more detailed temporal representation, reducing data loss and optimizing the processing of activity duration, which changes over time. By enhancing the quality and richness of sampled data, the random frame (R-Frame) enhancer improves the results for HAR.
  • Data Augmentation Layer (Mix-up):
Once we obtain data slices with the sampling layer (R-Frame), we transition to the data augmentation layer, where we apply Mix-up. When applied, this method is instrumental in generating novel data by blending characters of the present dataset.
Consider two original samples from our dataset:
  • Sample 1:
    Features: [ 1 , 2 , 3 , 4 ]
    Label: Class A
  • Sample 2:
    Features: [ 5 , 6 , 7 , 8 ]
    Label: Class B
To create a new virtual sample, Mix-up employs linear interpolation between these samples using a mixing ratio ( λ ). For this example, we set λ = 0.7 .
The new features and labels are computed as follows:
New Features = λ × [ 1 , 2 , 3 , 4 ] + ( 1 λ ) × [ 5 , 6 , 7 , 8 ]
Breaking this down, we calculate the following:
New Features = 0.7 × [ 1 , 2 , 3 , 4 ] + 0.3 × [ 5 , 6 , 7 , 8 ]
= [ 0.7 × 1 + 0.3 × 5 , 0.7 × 2 + 0.3 × 6 , 0.7 × 3 + 0.3 × 7 , 0.7 × 4 + 0.3 × 8 ]
= [ 0.7 + 1.5 , 1.4 + 1.8 , 2.1 + 2.1 , 2.8 + 2.4 ]
= [ 2.2 , 3.2 , 4.2 , 5.2 ]
The new label is an interpolated combination of Class A and Class B based on λ . For simplicity, assume we create a label that is 70 % Class A and 30 % Class B.
Thus, the generated virtual sample is
  • Features: [ 2.2 , 3.2 , 4.2 , 5.2 ]
  • Label: Weighted combination of Class A and Class B (e.g., [ 0.7 , 0.3 ] ).
This process enables the generation of new samples that are a blend of the original data points, thereby enhancing the model’s exposure to a broader range of feature combinations and label distributions.While we use the same features for both classes, the values differ, allowing the Mix-up method to create new variations that can help the model learn more effectively. The choice of λ = 0.7 gives a slight advantage to Class A, ensuring that the model learns from both classes without being overly biased towards one. This balanced approach helps in developing a more robust model capable of generalizing better across diverse scenarios.

3.5. Vector Quantization Using K-Means

We apply a technique to identify the most appropriate number of clusters in any given set for K-means clustering in order to implement vector quantization. The process is described as follows:
  • Elbow Method for Determining Cluster Numbers:
Originally, we utilize the elbow method to uncover the best amount of clusters (K) for each category in the data. This essentially means plotting the Within-Cluster Sum of Squares (WCSS) against the number of clusters, then identifying the “elbow point” where the W C S S starts to decline substantially with respect to the number of clusters. It indicates the ideal number of clusters for data.
By utilizing the elbow technique, the optimum number of clusters for each group is identified, guaranteeing that K-means effectively captures the hidden data patterns without overfitting or underfitting.
  • K-Means Clustering for Vector Quantization:
In order to find an accurate number of groups (l), the K-means algorithm is suitable. Following a certain similarity based on properties, it categorizes attribute vectors of each of its clusters into l. This happens repeatedly; hence, any point (x) belonging to X is first assigned to the closest cluster (c) and, further, finds a new center ( t ).
Vector quantization is achieved by clustering feature vectors using K-means such that every piece of information has its own representative from which all points within this group are measured in relation to. As such, information loss can be minimized when it comes to dimensionality reduction techniques.
  • Maximum Clusters to Minimize Information Loss:
In order to avoid loss of information in the course of performing vector quantification, is guaranteed that clusters per class are within a certain specified range with the aim of avoiding the over-compression of the data, resulting in distortion of information.
Balancing between dimensionality reduction and the preservation of data strength is achieved by fixing the limit of cluster size, thereby maximizing the efficiency of subsequent classification models.

3.6. Conversion to One-Hot Encoded Vector

Once the disecrete sequences are obtained via vector quantization with K-means clustering, the following step involves transforming a sequence into a one-hot encoded vector. This process, as well as the rationale behind it, is explained for clarity.
  • Conversion Process:
Every single sequence that is acquired as a result of vector quantization represents a definite cluster center that most comfortably defines the feature vectors of the appropriate group. These centers constitute a certain group of marks for these points. We use one-hot encoding to convert these discrete sequences into a format suitable for input into a CNN model. In this process, each discrete label is represented as a binary vector where only one element is set to 1 and all others are set to 0. The position of the 1 element corresponds to the index of the centroid in the cluster space.
  • Reasoning for Conversion:
CNN models usually necessitate that the input data be vectorized numbers that represent a particular feature in each dimension. This is not appropriate, since the model will fail to learn relationships between multiple forms if the sequences from vector quantization are directly fed into the CNN model. Alternatively, we can change the distribution data into some numerical representation that might be understood better by the CNN model. This modification allows the model to treat each centroid as a separate group, allowing it to accurately describe the relationship between centroids and activity class. This is crucial for the purpose of satisfying the CNN model’s input conditions. There is a need to maintain the consistency of all input patterns, where each one-hot encoding vector is of a length equal to the aggregate number of centroids.

3.7. Integration with CNN Models

The next step is putting these vectors into a CNN model in order to be classified after converting them from discrete sequences to one-hot encoded vectors.
The insertion of one-hot encoded vectors enhances effective learning and classification mechanisms for human activities by using the CNN model [21]. This model captures the complex patterns and relations that exist in the input data for accurate detection using the hierarchical feature extraction abilities of CNNs.

4. Datasets

4.1. OPP79 Dataset

The OPP dataset [34] is a challenging human activity recognition (HAR) dataset characterized by imbalanced class distributions. It comprises data from four subjects, with a total of 18 daily kitchen activities, such as opening doors and closing drawers. The data were collected over five runs, each lasting approximately 30 min per subject. The sensors, namely 79 IMU sensors, were placed on the body of each participant at various locations to capture detailed movement data. The sampling rate was set at 30 Hz. We employ a hold-out evaluation protocol: the second run from subject 1 serves as the validation set, runs 4 and 5 from subjects 2 and 3 are used for testing, and the remaining data are used for training.

4.2. PAMAP2 Dataset

The PAMAP2 dataset [30] is a widely recognized HAR dataset containing data from 12 daily activities, such as running, walking, lying, and sitting. The dataset includes recordings from nine subjects, with each session lasting around 1 h. IMU data were collected from multiple sensors placed on the hand, chest, and ankle, capturing a variety of signals, including accelerometer, gyroscope, magnetometer, temperature, and heart-rate measurements, resulting in a total of 52 dimensions of data. The sampling frequency for the sensors was 100 Hz. We utilize a hold-out evaluation protocol, where runs 1 and 2 from subject 5 are designated for validation, runs 1 and 2 from subject 6 for testing, and the remaining data for training.

4.3. WISDM Dataset

The WISDM dataset [33] includes 1,098,207 instances of data from six distinct activities: walking, jogging, upstairs, downstairs, sitting, and standing. Data were collected using an accelerometer embedded in an Android smartphone, which was carried in the front leg pocket of 20 subjects while they performed the activities. Each recording session lasted approximately 1 h, with a sampling frequency of 20 Hz. The dataset captures a comprehensive view of the subjects’ movements across different activities.

5. Experiment and Results

In this section, we discuss the experimental setup, the description of which includes how to generate the dataset; data pre-processing (the stage that comes before the training of an Artificial Neural Network (ANN) may imply various transformations, such a z-form transformation or others); vector quantization (what type of method to use, including the typical block size); and the instantiation of the CNN architecture to classify human activities after each segment is evaluated by a practitioner’s eye or changes in detector signals.

5.1. Implementation

We generated a dataset for different human activities, like walking, running, sitting, and standing, in the first stage of implementation, and we used it to collect data. When collecting this information, different gadgets like accelerometers and gyroscopes were strapped on people who were doing these exercises. That is why we can say that the recorded information fully reflects real exercises, providing raw data for other operations.
We determined the optimal number of clusters for each activity class with the help of the elbow method, as presented in Figure 2. By observing the within-cluster sum of squares (WCSS) results against the number of clusters, we were able to find out at what point the WCSS decrease rate declines considerably, thereby signaling the best number of clusters to form the Cluster Size (CS).
For researchers to know the best number of clusters, first, they find the optimal number of clusters. We first found the optimal number of clusters using the elbow method, then applied the K-means algorithm to group feature vectors into clusters based on their similarity. Hence, if there is an example where k = 5 (optimal number), we can form 5 groups with similar attributes (feature vectors); all data points are assigned either one or another cluster centroid. Later, this process results in a lower dimensional space without losing many key entities.
Discrete sequences obtained by K-means clustering are converted into one-hot encoded vectors. Each discrete label is represented as a binary vector where one element is set to 1 and all others are set to 0. One-hot encoding transforms categorical data into a numerical representation suitable for input into the CNN model, enabling consistent dimensionality and effective learning of relationships between discrete labels and activity classes.
In the implementation of our proposed methodology, we used a CNN architecture for human activity recognition tasks. The C N N 2 D _ 3 L model has multiple convolutional layers, followed by group normalization, max pooling, and fully connected layers. The model was trained on different datasets with different parameters using three layers. The model uses the Adam optimizer with a learning rate of 0.001 . During training, data preprocessing techniques such as null-value removal, duplication removal, and data augmentation were applied to enhance the diversity and quality of the training dataset. The model was trained for 10 epochs, with performance metrics monitored throughout the training process. After integration, the trained model showed stable behavior with satisfactory accuracy and loss curves.

5.2. CNN Model Implementation

Our CNN model, called C N N 2 D _ 3 L , is designed to capture complex patterns in the input data through a series of convolutional and pooling layers, followed by a fully connected layer.The architecture is shown in Figure 3 and is described below.
The first convolutional layer plays an important role in extracting low-level features from the input data. This layer has a kernel size of ( 5 , 1 ) and 256 filters capturing basic patterns such as edges and textures. Normalizing the activations within each group helps to increase the learning process with four-group normalization. Subsequently, its dimensions decrease as a results of subsampling of the feature map using max pooling ( 2 , 1 ) , reducing both the complexity and computational costs. Complex patterns in this model are described using the non-linearity of Rectified Linear Units (ReLUs).
The second convolutional layer builds on the features learned by the previous layer to refine them and capture more complex patterns and relationships in the data. Consequently, it preserves parameters identical to those of the first layer—256 filters, group normalization, max pooling, and ReLU activation. Hence, it proceeds this way in extracting features across hierarchies.
The third and last convolutional layer in feature extraction acts by enhancing the functionality of the recognized pattern. This layer employs the same settings as above, with the same 256 filters applied for feature mapping before carrying out a group normalization operation, as well as R e L U activation. The purpose behind designing it this way is that by progressively examining input datasets, it is possible extract intricate patterns necessary for correct identification.
To reduce overfitting and improve the model’s generalization capability, a dropout layer is inserted before the fully connected layers. With a dropout probability of 0.5 , this layer randomly deactivates half of the neurons during training, forcing the model to learn robust features that are invariant to small variations in the input.
After feature extraction, the model is transferred to the classification stage with fully connected layers. The first fully connected layer reduces the dimensionality of the feature space, projecting the extracted features into a lower dimensional representation. With 8960 input features and 128 output features, this layer facilitates the transformation of abstract features into interpretable representations.
Following the first fully connected layer, another ReLU activation function introduces non-linearity, enabling the model to learn complex mappings between features and activity classes. Dropout regularization with a probability of 0.5 is applied again to enhance the model’s resilience to overfitting.
The final fully connected layer serves as the output layer of the model, mapping the lower dimensional feature representation to the output space. With 128 input features and 12 output features (corresponding to the number of activity classes that differ depending on the dataset), this layer computes the probabilities of each class using a softmax activation function. The class with the highest probability is predicted as the output.

5.3. Results

The performance of the proposed model is evaluated using accuracy and the F1 score. Table 1 summarizes the results compared to the technique proposed in [3,4,35].
  • Results on the OPP79 Dataset
The performance of the model on the OPP79 dataset is evaluated using the F1 score shown in Figure 4 and the loss curve shown in Figure 5.
  • Results on the PAMAP2 Dataset
The performance of the model on the PAMAP2 dataset is evaluated using the F1 score shown in Figure 6 and the loss curve shown in Figure 7.
  • Results on the WISDM Dataset
We tested the WISDM dataset with different learning rates, but the F1 score stayed at 100% every time. This is because the dataset contains clean and well-labeled sensor data from both smartphones and smartwatches collected from 51 subjects performing 18 different activities. The activities are distinct and varied, making it easy for models to recognize patterns. Additionally, the sliding window technique used to process the time-series data helps improve model performance. Previous research has also shown similar high accuracy with this dataset [36,37]. Figure 8 and Figure 9 depict curves that were generated to check the impact of learning rates on the accuracy and loss of the model.

6. Discussion

In our study on human activity recognition (HAR), we found some clear strengths and challenges with our approach. One of the biggest strengths is that our model can accurately recognize complex activities using data from different sensors. We achieved impressive accuracy across several datasets, like OPP79, PAMAP2, and WISDM. This indicates that our method could be really useful in real-life situations, especially in healthcare, where accurately tracking patient activities can lead to better health outcomes.
We noticed that our model tended to overfit the training data, meaning it learned too much from the training examples and struggled with new data. Although we designed our approach to reduce this problem through data generation and transformation, the fact that overfitting still occurred shows we need to refine our methods further. For future improvements, we could look into better data augmentation techniques and feature extraction strategies to help the model perform better on new data.
Overall, our HAR approach shows great potential, but tackling these challenges will be crucial for its development. By focusing on stability and improving how well the model generalizes to new situations, we can create even more effective human activity recognition systems.

7. Conclusions

We investigated human activity information (HAR) with advanced modeling techniques and obtained effective results on a wide range of data. Focusing on the the OPP79 [34], PAMAP2 [30], and WISDM [33] datasets, we showed that our proposed method outperforms existing models in terms of F1 score. Specifically, our model achieved 89 % accuracy for OPP79, 93.24 % for PAMAP2, and 100 % for WISDM. These results demonstrate the model’s performance and accuracy in identifying complex activities captured by wearable devices.
While our approach has shown significant potential for practical applications, particularly in healthcare, where precise monitoring of patient activities is critical, we also encountered limitations that must be addressed. Challenges such as training instability and model overfitting indicate that further refinement is necessary. These issues can adversely affect model performance and generalization to new data.
Looking ahead, we aim to enhance our research by exploring additional data augmentation techniques and optimizing feature extraction methods. By addressing these challenges, we aspire to improve the model’s robustness and versatility, ultimately paving the way for more effective human activity recognition systems that can seamlessly integrate into everyday life.

Author Contributions

Conceptualization, I.F., A.A.F. and M.T.; methodology, I.F. and A.A.F.; software, I.F., A.A.F. and M.T.; validation, I.F., S.u.R. and H.A.A.; formal analysis, S.u.R., H.A.A. and F.T.; investigation, M.T. and A.A.F.; resources, S.u.R., H.A.A. and F.T.; data curation, M.T. and A.A.F.; writing—original draft preparation, I.F., A.A.F. and M.T.; writing—review and editing S.u.R., H.A.A. and F.T.; visualization, M.T.; supervision, A.A.F. and M.T.; project administration, A.A.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) under Grant IMSIU-RG23147.

Data Availability Statement

The datasets used in this research are benchmark datasets that are publicly available. The first is Actitracker Dataset [32]. The second dataset is taken from PAMAP2 Physical Activity Monitoring [30], and the third dataset is taken from the OPPORTUNITY Activity Recognition dataset [31].

Conflicts of Interest

The authors declare not conflicts of interest.

References

  1. Phukan, N.; Mohine, S.; Mondal, A.; Manikandan, M.S.; Pachori, R.B. Convolutional neural network-based human activity recognition for edge fitness and context-aware health monitoring devices. IEEE Sens. J. 2022, 22, 21816–21826. [Google Scholar] [CrossRef]
  2. Das, D.; Nishimura, Y.; Vivek, R.P.; Takeda, N.; Fish, S.T.; Ploetz, T.; Chernova, S. Explainable activity recognition for smart home systems. ACM Trans. Interact. Intell. Syst. 2023, 13, 7. [Google Scholar] [CrossRef]
  3. Shao, S.; Guan, Y.; Zhai, B.; Missier, P.; Plötz, T. ConvBoost: Boosting ConvNets for sensor-based activity recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2023, 7, 75. [Google Scholar] [CrossRef]
  4. Haresamudram, H.; Essa, I.; Ploetz, T. Towards Learning Discrete Representations via Self-Supervision for Wearables-Based Human Activity Recognition. Sensors 2024, 24, 1238. [Google Scholar] [CrossRef] [PubMed]
  5. Plötz, T. If only we had more data!: Sensor-Based Human Activity Recognition in Challenging Scenarios. In Proceedings of the 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops), Atlanta, GA, USA, 13–17 March 2023; pp. 565–570. [Google Scholar]
  6. Martinez-Rios, F.; Alvarez, L.A.G. A Transfer Learning Applied for Malaria Disease Detection on Blood Smear Images. In Proceedings of the 2023 19th International Symposium on Medical Information Processing and Analysis (SIPAIM), Mexico City, Mexico, 15–17 November 2023; pp. 1–7. [Google Scholar]
  7. Han, C.; Zhang, L.; Tang, Y.; Xu, S.; Min, F.; Wu, H.; Song, A. Understanding and improving channel attention for human activity recognition by temporal-aware and modality-aware embedding. IEEE Trans. Instrum. Meas. 2022, 71, 2513612. [Google Scholar] [CrossRef]
  8. Concone, F.; Re, G.L.; Morana, M. A fog-based application for human activity recognition using personal smart devices. ACM Trans. Internet Technol. (TOIT) 2019, 19, 20. [Google Scholar] [CrossRef]
  9. Batool, S.; Khan, M.H.; Farid, M.S. An ensemble deep learning model for human activity analysis using wearable sensory data. Appl. Soft Comput. 2024, 159, 111599. [Google Scholar] [CrossRef]
  10. Yan, J.; Tang, X.; Zhou, Z.q.; Zhang, J.; Zhao, Y.; Li, S.; Luo, A. Sirtuins functions in central nervous system cells under neurological disorders. Front. Physiol. 2022, 13, 886087. [Google Scholar] [CrossRef]
  11. Sharma, V.; Gupta, M.; Pandey, A.K.; Mishra, D.; Kumar, A. A review of deep learning-based human activity recognition on benchmark video datasets. Appl. Artif. Intell. 2022, 36, 2093705. [Google Scholar] [CrossRef]
  12. Raza, N.; Naseer, A.; Tamoor, M.; Zafar, K. Alzheimer disease classification through transfer learning approach. Diagnostics 2023, 13, 801. [Google Scholar] [CrossRef]
  13. Haresamudram, H.; Essa, I.; Plötz, T. Investigating enhancements to contrastive predictive coding for human activity recognition. In Proceedings of the 2023 IEEE International Conference on Pervasive Computing and Communications (PerCom), Atlanta, GA, USA, 13–17 March 2023; pp. 232–241. [Google Scholar]
  14. Swain, V.D.; Kwon, H.; Sargolzaei, S.; Saket, B.; Morshed, M.B.; Tran, K.; Patel, D.; Tian, Y.; Philipose, J.; Cui, Y.; et al. Leveraging WiFi network logs to infer student collocation and its relationship with academic performance. EPJ Data Sci. 2023, 12, 22. [Google Scholar] [CrossRef]
  15. Tamoor, M.; Younas, I. Automatic segmentation of medical images using a novel Harris Hawk optimization method and an active contour model. J. X-Ray Sci. Technol. 2021, 29, 721–739. [Google Scholar] [CrossRef] [PubMed]
  16. Park, H.; Lee, G.H.; Han, J.; Choi, J.K. Multiclass autoencoder-based active learning for sensor-based human activity recognition. Future Gener. Comput. Syst. 2024, 151, 71–84. [Google Scholar] [CrossRef]
  17. Malik, Y.S.; Tamoor, M.; Naseer, A.; Wali, A.; Khan, A. Applying an adaptive Otsu-based initialization algorithm to optimize active contour models for skin lesion segmentation. J. X-Ray Sci. Technol. 2022, 30, 1169–1184. [Google Scholar] [CrossRef] [PubMed]
  18. Thakur, D.; Biswas, S. Guided regularized random forest feature selection for smartphone based human activity recognition. J. Ambient Intell. Humaniz. Comput. 2023, 14, 9767–9779. [Google Scholar] [CrossRef]
  19. Lara, O.D.; Labrador, M.A. A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutor. 2012, 15, 1192–1209. [Google Scholar] [CrossRef]
  20. Gupta, N.; Gupta, S.K.; Pathak, R.K.; Jain, V.; Rashidi, P.; Suri, J.S. Human activity recognition in artificial intelligence framework: A narrative review. Artif. Intell. Rev. 2022, 55, 4755–4808. [Google Scholar] [CrossRef]
  21. Andrade-Ambriz, Y.A.; Ledesma, S.; Ibarra-Manzano, M.A.; Oros-Flores, M.I.; Almanza-Ojeda, D.L. Human activity recognition using temporal convolutional neural network architecture. Expert Syst. Appl. 2022, 191, 116287. [Google Scholar] [CrossRef]
  22. Bozkurt, F. A comparative study on classifying human activities using classical machine and deep learning methods. Arab. J. Sci. Eng. 2022, 47, 1507–1521. [Google Scholar] [CrossRef]
  23. Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv 2016, arXiv:1604.08880. [Google Scholar]
  24. Shan, S.; Guan, Y.; Guan, X.; Missier, P.; Plötz, T. On Training Strategies for LSTMs in Sensor-Based Human Activity Recognition. In Proceedings of the 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops), Atlanta, GA, USA, 13–17 March 2023; pp. 653–658. [Google Scholar]
  25. Leng, Z.; Kwon, H.; Plötz, T. Generating virtual on-body accelerometer data from virtual textual descriptions for human activity recognition. In Proceedings of the 2023 ACM International Symposium on Wearable Computers, Cancun, Mexico, 8–12 October 2023; pp. 39–43. [Google Scholar]
  26. Tang, R.; Han, X.; Jiang, X.; Hu, X. Does synthetic data generation of llms help clinical text mining? arXiv 2023, arXiv:2303.04360. [Google Scholar]
  27. Li, X.; Jia, Y.; Chiu, C.C. Textless direct speech-to-speech translation with discrete speech representation. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
  28. Chen, Y.; Yuan, J.; Tian, Y.; Geng, S.; Li, X.; Zhou, D.; Metaxas, D.N.; Yang, H. Revisiting multimodal representation in contrastive learning: From patch and token embeddings to finite discrete tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15095–15104. [Google Scholar]
  29. Inoue, N.; Kikuchi, K.; Simo-Serra, E.; Otani, M.; Yamaguchi, K. Layoutdm: Discrete diffusion model for controllable layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10167–10176. [Google Scholar]
  30. Reiss, A.; Stricker, D. Introducing a new benchmarked dataset for activity monitoring. In Proceedings of the 2012 16th International Symposium on Wearable Computers, Newcastle, UK, 18–22 June 2012; pp. 108–109. [Google Scholar]
  31. Roggen, D.; Calatroni, A.; Rossi, M.; Holleczek, T.; Förster, K.; Tröster, G.; Lukowicz, P.; Bannach, D.; Pirkl, G.; Ferscha, A.; et al. Collecting complex activity datasets in highly rich networked sensor environments. In Proceedings of the 2010 Seventh International Conference on Networked Sensing Systems (INSS), Kassel, Germany, 15–18 June 2010; pp. 233–240. [Google Scholar]
  32. Kwapisz, J.R.; Weiss, G.M.; Moore, S.A. Activity recognition using cell phone accelerometers. ACM SigKDD Explor. Newsl. 2011, 12, 74–82. [Google Scholar] [CrossRef]
  33. Agrawal, D.K.; Udgata, S.K.; Usaha, W. Leveraging Smartphone Sensor Data and Machine Learning Model for Human Activity Recognition and Fall Classification. Procedia Comput. Sci. 2024, 235, 1980–1989. [Google Scholar] [CrossRef]
  34. Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.T.; Tröster, G.; Millán, J.d.R.; Roggen, D. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett. 2013, 34, 2033–2042. [Google Scholar] [CrossRef]
  35. Shdefat, A.Y.; Mostafa, N.; Al-Arnaout, Z.; Kotb, Y.; Alabed, S. Optimizing HAR Systems: Comparative Analysis of Enhanced SVM and k-NN Classifiers. Int. J. Comput. Intell. Syst. 2024, 17, 150. [Google Scholar] [CrossRef]
  36. Liu, T.; Wang, S.; Liu, Y.; Quan, W.; Zhang, L. A lightweight neural network framework using linear grouped convolution for human activity recognition on mobile devices. J. Supercomput. 2022, 78, 6696–6716. [Google Scholar] [CrossRef]
  37. Dua, N.; Singh, S.N.; Semwal, V.B. Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing 2021, 103, 1461–1478. [Google Scholar] [CrossRef]
  38. Weiss, G. Wisdm smartphone and smartwatch activity and biometrics dataset. UCI Mach. Learn. Repos. WISDM Smartphone Smartwatch Act. Biom. Dataset Data Set. 2019, 7, 133190–133202. [Google Scholar]
Figure 1. High-level architecture of DiscHAR.
Figure 1. High-level architecture of DiscHAR.
Computers 13 00300 g001
Figure 2. Elbow method to determine clusters within each activity class, where the x axis shows the number of clusters and the y axis represents distortion.
Figure 2. Elbow method to determine clusters within each activity class, where the x axis shows the number of clusters and the y axis represents distortion.
Computers 13 00300 g002
Figure 3. Detailed overview of the CNN model used in DiscHAR.
Figure 3. Detailed overview of the CNN model used in DiscHAR.
Computers 13 00300 g003
Figure 4. F1 score for the OPP79 [34] dataset, where the x axis shows epochs and the y axis represents the F1 score.
Figure 4. F1 score for the OPP79 [34] dataset, where the x axis shows epochs and the y axis represents the F1 score.
Computers 13 00300 g004
Figure 5. Loss curve for the OPP79 [34] dataset, where the x axis shows epochs and the y axis represents the loss curve.
Figure 5. Loss curve for the OPP79 [34] dataset, where the x axis shows epochs and the y axis represents the loss curve.
Computers 13 00300 g005
Figure 6. F1 score for the PAMAP2 [30] dataset, where the x axis shows epochs and the y axis represents the F1 score.
Figure 6. F1 score for the PAMAP2 [30] dataset, where the x axis shows epochs and the y axis represents the F1 score.
Computers 13 00300 g006
Figure 7. Loss curve for the PAMAP2 [30] dataset, where the x axis shows epochs and the y axis represents the loss curve.
Figure 7. Loss curve for the PAMAP2 [30] dataset, where the x axis shows epochs and the y axis represents the loss curve.
Computers 13 00300 g007
Figure 8. Accuracy for the WISDM [38] dataset, where the x axis shows epochs and the y axis represents the accuracy for different learning rates.
Figure 8. Accuracy for the WISDM [38] dataset, where the x axis shows epochs and the y axis represents the accuracy for different learning rates.
Computers 13 00300 g008
Figure 9. Loss curve for the WISDM [38] dataset, where the x axis shows epochs and the y axis represents the loss curve for different learning rates.
Figure 9. Loss curve for the WISDM [38] dataset, where the x axis shows epochs and the y axis represents the loss curve for different learning rates.
Computers 13 00300 g009
Table 1. Comparison of F1 scores across different datasets.
Table 1. Comparison of F1 scores across different datasets.
DatasetF1 Score [3]F1 Score [4]F1 Score [35]F1 Score (Our Model)
OPP7972.81 ± 0.76 86.89%Training: 85.46%
Testing: 89%
PAMAP290.05 ± 0.5660.25 ± 0.7286.37%Training: 94%
Testing: 93.24%
WISDM 96.8%Training: 100%
Testing: 100%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fatima, I.; Farhan, A.A.; Tamoor, M.; ur Rehman, S.; Alhulayyil, H.A.; Tariq, F. DiscHAR: A Discrete Approach to Enhance Human Activity Recognition in Cyber Physical Systems: Smart Homes. Computers 2024, 13, 300. https://doi.org/10.3390/computers13110300

AMA Style

Fatima I, Farhan AA, Tamoor M, ur Rehman S, Alhulayyil HA, Tariq F. DiscHAR: A Discrete Approach to Enhance Human Activity Recognition in Cyber Physical Systems: Smart Homes. Computers. 2024; 13(11):300. https://doi.org/10.3390/computers13110300

Chicago/Turabian Style

Fatima, Ishrat, Asma Ahmad Farhan, Maria Tamoor, Shafiq ur Rehman, Hisham Abdulrahman Alhulayyil, and Fawaz Tariq. 2024. "DiscHAR: A Discrete Approach to Enhance Human Activity Recognition in Cyber Physical Systems: Smart Homes" Computers 13, no. 11: 300. https://doi.org/10.3390/computers13110300

APA Style

Fatima, I., Farhan, A. A., Tamoor, M., ur Rehman, S., Alhulayyil, H. A., & Tariq, F. (2024). DiscHAR: A Discrete Approach to Enhance Human Activity Recognition in Cyber Physical Systems: Smart Homes. Computers, 13(11), 300. https://doi.org/10.3390/computers13110300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop