This part contains a step-by-step description of our work, divided into the following sections. First,
Section 3.1 presents the procedure carried out to collect the data. Then, in
Section 3.2, we describe how the data was prepared to use once the data collection was over, as well as the features extracted from them. Finally,
Section 3.3 offers a summary of the classification algorithm applied.
3.1. Data Collection
Data collection was made through an Android app developed by the authors that allowed an easy recording, labeling and storage of the data. To do this, we organized an initial data collection that lasted about a month, to see what data we were getting and to be able to do some initial tests on it. Later, we carried out another more intensive collection, over a period of about a week, to alleviate the imbalances and weaknesses found in the previous gathering. Each of the people who took part in the study was asked to set the activity they were going to perform at each moment, through that Android app, before starting the data collection. In this way, once the activity was selected, the gathering of such data was automatically started, until the user indicated the end of such activity. Hence, each stored session corresponds to a specific activity, carried out by a particular individual. Regarding the activities performed, they were four:
Inactive: not carrying the mobile phone. For example, the device is on the desk while the individual performs another kind of activities.
Active: carrying the mobile phone, moving, but not going to a particular place. In other words, this means that, for example, making dinner, being in a concert, buying groceries or doing the dishes count as “active” activities.
Walking: Moving to a specific place. In this case, running or jogging count as a “walking” activity.
Driving: Moving in a means of transport powered by an engine. This would include cars, buses, motorbikes, trucks and any similar.
The data collected comes from four different sensors: accelerometer, gyroscope, magnetometer and GPS. We selected accelerometer and gyroscope because they are the most used in the literature and the ones that showed the best results. We also added the magnetometer and GPS because we think they could be useful in this problem. In fact, in our case, GPS should be essential to differentiate the activities performed by being able to detect the user’s movement speed who carries the smartphone.
We save the data of the accelerometer, the gyroscope and the magnetometer with their tri-axial values. In the case of GPS, we store the device’s increments in latitude, longitude and altitude, as well as the bearing, speed and accuracy of the collected measurements. Also, for the accelerometer, we used the gravity sensor, subtracting the last reading of the latter from the observations of the first. In this way, we get clear accelerometer values (linear accelerometer), as they are not affected by the smartphone’s orientation. Therefore, we obtain a dataset independent of the place where the individual is, as well as of the device’s bearings.
On the other hand, before saving the data locally, a series of filters are applied. In the case of the accelerometer and magnetometer, we use a low-pass filter to avoid too much noise in these sensor’s measurements. Concerning the gyroscope, to bypass the well-known gyro drift, a high-pass filter was used instead. Nevertheless, we also had to deal with Android’s sensor frequency problem, as we cannot set the same frequency for each one of them. In our case, this is especially problematic, having to join data from very high-frequency sensors such as the accelerometer, with a low-frequency sensor, such as the GPS. From the latter, we obtain new measurements every ten seconds, approximately, compared to the possible ten, or even 50, measurements per second we can get from the accelerometer. Anyhow, given the inability to set a frequency in Android and having to take the values as they are offered by the system itself, there may be gaps in the measurements. These gaps are especially problematic in the case of GPS, where there may be cases where no new measurements were obtained in more than a minute (although perhaps this is mainly due to the difficulty of accessing closed environments). Such gaps also occur in the case of the accelerometer, gyroscope or magnetometer, despite offering about 10, 5 or 8 measurements per second, respectively, in the most stable cases. In these cases, the gaps are between 1 and 5 s, and occur mostly at the start of each data collection session, although much less frequently than with GPS. In this way, in
Table 2, we show the average number of recordings per second for each sensor and each activity measured, as well as the resulting average frequency. Below each average value, in a smaller size, we also show the standard deviation for each class. Please note that for moving activities such as “active” or “walking” there is an increase in these measurements, especially with the accelerometer. This is because the smartphone detects these movements and, to get the most information, its frequency is increased automatically to get the maximum number of measurements. However, this increase also occurs during “driving” activity, even more so. Vibrations due to the car use may be the cause of this increase, as they might also be detected by the sensors of the smartphone. Additionally, in “walking” and “active” activities there may be certain inactive intervals (like waiting for a traffic light to go green or just standing doing something, respectively) that lower these averages.
In this way, the final distribution of the activities in our dataset is the one shown in
Table 3. In this table, we measured the total time recorded, the number of recordings, the number of samples and the percentage of data (this one related to the number of samples), for each of the activities we specified. Here, each recording refers to a whole activity session, since the individuals begin an action until they stop it; while each sample is related to a single sensor measurement. As can be seen, there are less samples on “inactive” activities in proportion to the total time recorded. This is because the frequency of the sensors increases with activities that require more movement, as explained above, so in these cases they remained at a lower value. Therefore, the total percentage of the data may give a wrong view of the total data distribution, once the sliding windows are applied. This is because, by using these windows on which to compute a series of features, the number of samples actually moves into second place, with the total time recorded being the most important value. The more total time recorded, the more sliding windows computed, and the more patterns for that class. Hence, there would be a much clearer imbalance in the dataset, where “inactive” activity would have three times as many patterns as in the case of “walking”. Regarding the number of recordings made, there are far more with the “walking” activity than with the rest. Anyhow, we consider that the dataset remains useful and feasible to implement models that could distinguish these activities. Moreover, the total number of individuals who participated in the study was 19. Therefore, the dataset also contains different kinds of behaviors that end up enriching the possible models developed later.
On the other hand, there is also another problem in Android, as not all devices contain a gyroscope or a magnetometer to this day. While it is mandatory to have an accelerometer and a GPS, a gyroscope or a magnetometer are not compulsory in older versions of Android. In this way, some of our users took measurements without including these sensors. In
Table 4 and
Table 5, we show the number of samples that do not include a gyroscope or a gyroscope and a magnetometer simultaneously, as the people who did not have a magnetometer did not have a gyroscope either. Something important to highlight in these tables is the difference in the relation between the number of samples and the time recorded compared to the one showed in
Table 3. Here, the number of samples is much higher in relation to the time recorded. This may explain the strange data that we pointed out before in
Table 2, as the accelerometer may increase more its frequency in general, by becoming the only sensor to detect motion. On another note, the percentages we show in this table are related to the whole amount of data, from
Table 3. Fortunately, these percentages are quite low, and the dataset is not as affected by this problem. Anyhow, it will be something to keep in mind when preparing the data to be applied to a future AI model.
3.2. Data Preparation and Feature Extraction
After having collected the data, we proceed to prepare them to be introduced later in the model. To do so, and taking into account the well-known time-series segmentation problem in HAR, we opted to use sliding windows of 20 s, with an overlap of 19 s (95%). We chose 20 s because it is the most we have seen used in this field. Moreover, we consider that our activities, being long-themed, need a large window size to be correctly detected. We thought even a greater size could be beneficial, but we decided to be conservative and see what happens with a smaller one. As for the overlap, we chose the maximum possible that would allow us to have comfortable handling of the data, as well as a higher number of patterns, with one second between windows. In this way, we get around half a million patterns, on a quite long time window, compared to previous works. Additionally, with this distribution, we hope to get reliable results for the movements we are analyzing, as they are long-themed (inactive, active, walking and driving).
However, to apply these windows, it is first necessary to pre-process the data. The algorithm implemented to do so consists of deleting rows that met one or more of the following properties:
GPS increments in latitude, longitude and altitude that are higher than a given threshold, obtained from a prior, and very conservative, data study. We detected that there were occasional “jumps” in our GPS-related values, as some of these observations were outside the expected trajectory. For this reason, we decided to fix a threshold of 0.2 for latitude and longitude increments, and 500 for the altitude ones. In this way, any value that is too far out of line is eliminated, keeping those that are closer to the expected.
Timestamps that do not match the structure defined (yyyy-MM-dd HH:mm:ss.ZZZ) or that do not correspond to an actual date (year 1970 values, for example).
Any misplaced value between timestamp and z-axis magnetometer, which showed to appear in some very few observations at the beginning of the project.
Table 6 shows the mean and standard deviation values of each sensor for each of the activities studied, after the application of this algorithm. To correctly understand the values indicated in this table, it is important to explain what each of these sensors measures. The accelerometer values correspond to the acceleration force applied to the smartphone on the three physical axes (
), in m/s
. On the other hand, the gyroscope measures in rad/s the smartphone’s rotation speed around each of the three physical axes (
). Regarding the magnetometer, it measures the environmental geomagnetic field of the three physical axes (
) of the smartphone, in
T. As for the GPS, its values correspond, on the one hand, to the increments of the values of the geographical coordinates, longitude and latitude, in which the smartphone is located, with respect to the previous measurement. Similarly, the increments in altitude, in meters, were also measured. Then, the values of speed, bearing and accuracy were also taken into account. Speed was measured in m/s and specifies the speed that is taking the smartphone. The bearing measured the horizontal direction of travel of the smartphone, in degrees. Finally, accuracy values refer to the deviation from the actual smartphone location, in meters, where the smaller the value, the better the accuracy of the measurement. Going back to
Table 6, in each cell, the values corresponding to the mean are at the top and, at the bottom, in a smaller size, the standard deviation values. Each pair of values corresponds to the set that forms each sensor. In the case of the accelerometer, gyroscope and magnetometer, these refer to the values related to their “
X”, “
Y” and “
Z” axes. As for the GPS, this set is formed by the latitude increments (Lat.), the longitude increments (Long.), the altitude increments (Alt.), the speed (Sp.), the bearing (Bear.) and the accuracy (Acc.) of every measurement. Here, it is worth noting some rare data, such as those relating to GPS “inactive” activity, where the values are very high concerning what is expected from such action. In this case, we consider that these values are due to the fact that such activity is carried out in indoor environments, which are not so accessible for GPS. Even so, as can be seen, there are some clear differences between the activities, so the possibilities of identification with future models are more than feasible.
After applying previous preprocessing, since data collection required the user to tap a button before performing the activity, we eliminated the first five seconds of each activity collection. In the same way, we did so with the final five seconds of each measurement. Hence, we can prevent the future models from ending up learning the movement that precedes the start or the end of the action, such as, for example, putting the smartphone in the pocket or pulling it out. While doing this, we also get rid of those sessions that have quite large gaps between the data (at least five seconds) for any sensor other than the GPS, by considering them as invalid. In this way, in
Table 7, the final results after the application of this sliding window and overlap are shown for the samples containing all the sensors. As we already addressed in the previous section, although at the sample level the data may appear lower for activities such as inactive or walking, at the final pattern level the results are much different.
Later, we had to go through a transformation process to extract the features and apply all the information needed for the classification algorithm. Due to GPS’ low frequency, to carry out this feature extraction, it was necessary to previously replicate some of the data stored by this sensor, for each of the windows applied. To do this, if the difference between one observation and the next differed in a longer time than one second, the latter measurement is replicated, with a different timestamp. For this reason, all sessions that do not contain at least one GPS observation are removed from the list of valid ones for this process. We repeat this step until all the windows that may be in the middle are correctly filled. We selected one second as the amount of time to be between each sample, so there is always at least one observation in each of the windows applied, making the final feature extraction match to the data obtained. After that, for each set of measurements, we computed six different types of features, each generating a series of inputs for the AI model. The features used were: mean, variance, median absolute deviation, maximum, minimum and interquartile range, all based in the time domain. All of them were used in previous works like [
16], with remarkable results. In this way, we maintain the simplicity of the model, being able to complicate it or change it in future works according to the results we achieve.
3.3. Classification Algorithm
As already indicated in the related work section, there are many kinds of models used in HAR. In our case, we chose to employ an SVM model. Although SVM showed excellent results with rather short-themed activities, we consider it interesting to test it as an initial model in our dataset. It is one of the most used models in HAR, applied in works such as [
9,
16] and, more recently, in [
23], all with outstanding overall performance in this field, as well as being a simple and straightforward AI model.
An SVM is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model tagged training data sets for either category, they can categorize new examples. To do this, the SVM looks for the hyperplane that maximizes the margins between the two classes. In other words, it looks for the hyperplane whose distance from the nearest element in each category is the highest. Hither, non-linearity is achieved through kernel functions, which implicitly map the data to a more dimensional space where this linear approximation is applied. On the other hand, other hyperparameters such as C or gamma also affect the definition of this hyperplane. As for C, it marks the width of the margins of this hyperplane, as well as the number of errors that are accepted. Concerning gamma, it directly affects the curve of the hyperplane, making it softer or more accentuated, depending on the patterns that are introduced into the model.
While SVM is typically used to solve binary classification tasks, it can also be used in multi-class problems. To do this, it is necessary to use a one-vs-all or one-vs-one strategy. The first case is designed to model each class against all other classes independently. In this way, a classifier is created for each situation. On the other hand, the second case is used to model each pair of classes separately, performing various binary classifications, until a final result is found. In our case, we will be using a one-vs-all approach, as it is the most used one in the literature. For this, we implemented it on Python, using the functions provided with Scikit-learn.