Fusing Object Information and Inertial Data for Activity Recognition †
<p>Distribution of classes per test subject using logarithmic scale as the majority of class labels belong to the none class. It can be seen that the majority class (excluding the none class) changes for each subject.</p> "> Figure 2
<p>Sensor placement. The subject wears the wearable devices on the head, chest, forearm, and thigh (top down).</p> "> Figure 3
<p>Sensor data collector application. The application is able to record a big set of sensors in Android devices including inertial data, temperature, and audio for example.</p> "> Figure 4
<p>Example bounding boxes. It depicts a usual frame that was captured by our smart-glasses. We draw the bounding box for each object, even if it was only partly visible. The boxes were tagged concerning the visibility state of the object.</p> "> Figure 5
<p>Distribution of the classes we consider from the CMU-MMAC dataset using the annotations from [<a href="#B18-sensors-19-04119" class="html-bibr">18</a>]. The class label is derived from the verb part of the original label.</p> "> Figure 6
<p>Windowing of inertial data. Windows have a length of 1s and an overlap of 50% or 75%.</p> "> Figure 7
<p>Pipeline for the image feature generation.</p> "> Figure 8
<p>Pipeline for the fusion of the modalities. The top pipeline shows our early fusion method, the bottom one our late fusion approach.</p> "> Figure 9
<p>Distribution of the classes we consider from the CMU-MMAC dataset. The class label is derived from the verb part of the original label.</p> ">
Abstract
:1. Introduction
- We collected a new dataset with two subjects performing a set of activities in two different environments with a focus on activities that are hard to distinguish as they involve similar motions (e.g., eating and drinking) and are often interleaved. Each subject performed the activities in different human body positions and at different speeds. Currently there are few datasets that cover these scenarios; thus, other researchers in the field can test their approaches on this dataset.
- We present a new method and a baseline comparison for multimodal activity recognition, using deep learning models for object detection and evaluating this method on our presented dataset, achieving an -measure of 79.6%. We also apply our method to the CMU-MMAC [17] dataset and can show that we outperform previous work on the same dataset. Additionally we tested our method with a greater subset of the CMU-MMAC dataset, as a recent publication offers more annotations [18].
2. Related Work
2.1. Image Object Detection
2.2. Activity Recognition Based on Objects
2.3. Activity Recognition Based on Inertial Data
2.4. Multimodal Activity Recognition
3. Dataset
3.1. ADL Dataset
3.2. CMU-MMAC: Quality of Life Dataset
3.3. CMU-MMAC: New Annotations
4. Methods
4.1. Acceleration Data
4.2. Video
4.3. Combining Both Modalities
5. Experiments
5.1. ADL Dataset
5.2. CMU-MMAC Dataset
5.3. CMU MMAC: New Annotations
- Number of estimators
- Maximal depth of trees
- Min samples per leaf and per split
- The number of features to consider when splitting (all, or )
- Number of iterations
- Optimizer type (newton, simple)
- Distance C
6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Nguyen, T.H.C.; Nebel, J.C.; Florez-Revuelta, F. Recognition of activities of daily living with egocentric vision: A review. Sensors 2016, 16, 72. [Google Scholar] [CrossRef] [PubMed]
- Sztyler, T.; Stuckenschmidt, H. On-body Localization of Wearable Devices: An Investigation of Position-Aware Activity Recognition. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications (PerCom), Sydney, Australia, 14–19 March 2016; pp. 1–9. [Google Scholar]
- Song, S.; Chandrasekhar, V.; Mandal, B.; Li, L.; Lim, J.H.; Babu, G.S.; San, P.; Cheung, N.M. Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 378–385. [Google Scholar]
- Abebe, G.; Cavallaro, A. Hierarchical modeling for first-person vision activity recognition. Neurocomputing 2017, 267, 362–377. [Google Scholar] [CrossRef]
- Lawton, M.P.; Brody, E.M. Assessment of older people: Self-maintaining and instrumental activities of daily living. Gerontologist 1969, 9, 179–186. [Google Scholar] [CrossRef]
- Allin, S.; Bharucha, A.; Zimmerman, J.; Wilson, D.; Robinson, M.; Stevens, S.; Wactlar, H.; Atkeson, C. Toward the automatic assessment of behavioral disturbances of dementia. In Proceedings of the 2003 International Conference on Ubiquitous Computing (UbiComp), Seattle, WA, USA, 12–15 October 2003. [Google Scholar]
- Hori, T.; Nishida, Y.; Murakami, S. Pervasive sensor system for evidence-based nursing care support. In Proceedings of the 2006 IEEE International Conference on Robotics and Automation, Orlando, FL, USA, 15–19 May 2006; pp. 1680–1685. [Google Scholar]
- Wilson, D.H. Assistive Intelligent Environments for Automatic Health Monitoring; Carnegie Mellon University: Pittsburgh, PA, USA, 2005. [Google Scholar]
- Nam, Y.; Rho, S.; Lee, C. Physical Activity Recognition Using Multiple Sensors Embedded in a Wearable Device. ACM Trans. Embed. Comput. Syst. 2013, 12, 26:1–26:14. [Google Scholar] [CrossRef]
- Weiss, G.M.; Timko, J.L.; Gallagher, C.M.; Yoneda, K.; Schreiber, A.J. Smartwatch-based activity recognition: A machine learning approach. In Proceedings of the 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics, Las Vegas, NV, USA, 24–27 February 2016; pp. 426–429. [Google Scholar] [CrossRef]
- Pirsiavash, H.; Ramanan, D. Detecting activities of daily living in first-person camera views. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2847–2854. [Google Scholar] [CrossRef]
- Riboni, D.; Sztyler, T.; Civitarese, G.; Stuckenschmidt, H. Unsupervised Recognition of Interleaved Activities of Daily Living through Ontological and Probabilistic Reasoning. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September 2016; pp. 1–12. [Google Scholar] [CrossRef]
- Betancourt, A.; Morerio, P.; Regazzoni, C.S.; Rauterberg, M. The Evolution of First Person Vision Methods: A Survey. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 744–760. [Google Scholar] [CrossRef] [Green Version]
- Spriggs, E.H.; De La Torre, F.; Hebert, M. Temporal segmentation and activity classification from first-person sensing. In Proceedings of the IEEE Computer Society Conference On Computer Vision and Pattern Recognition Workshops, Miami, FL, USA, 20–25 June 2009; pp. 17–24. [Google Scholar]
- Windau, J.; Itti, L. Situation awareness via sensor-equipped eyeglasses. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 5674–5679. [Google Scholar] [CrossRef]
- Inc, A.T. Transitioning items from a materials handling facility. U.S. Patent US20150012396A1, 8 January 2015. [Google Scholar]
- De la Torre, F.; Hodgins, J.; Bargteil, A.; Martin, X.; Macey, J.; Collado, A.; Beltran, P. Guide to the Carnegie Mellon University Multimodal Activity (Cmu-Mmac) Database; Robotics Institute: Pittsburgh, PA, USA, 2008; p. 135. [Google Scholar]
- Yordanova, K.; Krüger, F.; Kirste, T. Providing semantic annotation for the cmu grand challenge dataset. In Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Athens, Greece, 19–23 March 2018; pp. 579–584. [Google Scholar]
- Diete, A.; Sztyler, T.; Stuckenschmidt, H. Vision and acceleration modalities: Partners for recognizing complex activities. In Proceedings of the 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Kyoto, Japan, 11–15 March 2019; pp. 101–106. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Computer Vision (ECCV); Springer International Publishing: Berlin, Germany, 2014; pp. 740–755. [Google Scholar] [Green Version]
- Huang, J.; Rathod, V.; Sun, C.; Zhu, M.; Korattikara, A.; Fathi, A.; Fischer, I.; Wojna, Z.; Song, Y.; Guadarrama, S.; et al. Speed/accuracy trade-offs for modern convolutional object detectors. arXiv 2017, arXiv:1611.10012. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. arXiv 2017, arXiv:1707.07012. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Damen, D.; Doughty, H.; Farinella, G.M.; Fidler, S.; Furnari, A.; Kazakos, E.; Moltisanti, D.; Munro, J.; Perrett, T.; Price, W.; et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Kumar, A.; Yordanova, K.; Kirste, T.; Kumar, M. Combining off-the-shelf Image Classifiers with Transfer Learning for Activity Recognition. In Proceedings of the 5th International Workshop on Sensor-Based Activity Recognition and Interaction, Berlin, Germany, 20–21 September 2018; p. 15. [Google Scholar]
- Wu, J.; Osuntogun, A.; Choudhury, T.; Philipose, M.; Rehg, J.M. A Scalable Approach to Activity Recognition based on Object Use. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar] [CrossRef]
- Lei, J.; Ren, X.; Fox, D. Fine-grained kitchen activity recognition using rgb-d. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 208–211. [Google Scholar]
- Maekawa, T.; Yanagisawa, Y.; Kishino, Y.; Ishiguro, K.; Kamei, K.; Sakurai, Y.; Okadome, T. Object-based activity recognition with heterogeneous sensors on wrist. In International Conference on Pervasive Computing; Springer: Berlin, Germany, 2010; pp. 246–264. [Google Scholar]
- Kwapisz, J.R.; Weiss, G.M.; Moore, S.A. Activity Recognition Using Cell Phone Accelerometers. SIGKDD Explor. Newsl. 2011, 12, 74–82. [Google Scholar] [CrossRef]
- Preece, S.J.; Goulermas, J.Y.; Kenney, L.P.J.; Howard, D. A Comparison of Feature Extraction Methods for the Classification of Dynamic Activities From Accelerometer Data. IEEE Trans. Biomed. Eng. 2009, 56, 871–879. [Google Scholar] [CrossRef] [PubMed]
- San-Segundo, R.; Montero, J.M.; Barra-Chicote, R.; Fernández, F.; Pardo, J.M. Feature extraction from smartphone inertial signals for human activity segmentation. Signal Process. 2016, 120, 359–372. [Google Scholar] [CrossRef]
- Delahoz, Y.S.; Labrador, M.A. Survey on fall detection and fall prevention using wearable and external sensors. Sensors 2014, 14, 19806–19842. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Zhang, D.; Wang, Y.; Ma, J.; Wang, Y.; Li, S. RT-Fall: A Real-Time and Contactless Fall Detection System with Commodity WiFi Devices. IEEE Trans. Mob. Comput. 2017, 16, 511–526. [Google Scholar] [CrossRef]
- Krupitzer, C.; Sztyler, T.; Edinger, J.; Breitbach, M.; Stuckenschmidt, H.; Becker, C. Hips do lie! A position-aware mobile fall detection system. In Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom), Athens, Greece, 19–23 March 2018; pp. 1–10. [Google Scholar]
- Ordóñez, F.J.; Roggen, D. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Jafari, R.; Kehtarnavaz, N. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 168–172. [Google Scholar]
- Song, S.; Cheung, N.M.; Chandrasekhar, V.; Mandal, B.; Liri, J. Egocentric activity recognition with multimodal fisher vector. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 2717–2721. [Google Scholar] [CrossRef]
- Kelly, J.; Sukhatme, G.S. Visual-inertial sensor fusion: Localization, mapping and sensor-to-sensor self-calibration. Int. J. Robot. Res. 2011, 30, 56–79. [Google Scholar] [CrossRef]
- Armesto, L.; Chroust, S.; Vincze, M.; Tornero, J. Multi-rate fusion with vision and inertial sensors. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 26 April–1 May 2004; Volume 1, pp. 193–199. [Google Scholar]
- Friard, O.; Gamba, M. BORIS: A free, versatile open-source event-logging software for video/audio coding and live observations. Methods Ecol. Evol. 2016, 7, 1325–1330. [Google Scholar] [CrossRef]
- Vondrick, C.; Patterson, D.; Ramanan, D. Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 2013, 101, 184–204. [Google Scholar] [CrossRef]
- Zhang, G.; Piccardi, M. Structural SVM with partial ranking for activity segmentation and classification. IEEE Signal Process. Lett. 2015, 22, 2344–2348. [Google Scholar] [CrossRef]
- Diete, A.; Sztyler, T.; Weiland, L.; Stuckenschmidt, H. Recognizing grabbing actions from inertial and video sensor data in a warehouse scenario. Procedia Comput. Sci. 2017, 110, 16–23. [Google Scholar] [CrossRef]
- Diete, A.; Sztyler, T.; Stuckenschmidt, H. Exploring Semi-Supervised Methods for Labeling Support in Multimodal Datasets. Sensors 2018, 18, 2639. [Google Scholar] [CrossRef] [PubMed]
Time Domain | Frequency Domain |
---|---|
Mean, Median, Standard Deviation, Variance, Inter Quantil Range, MAD, Kurtosis, Correlation Coefficient, Gravity, Orientation, Entropy (Time) | Energy, Entropy (Frequency), MeanDC |
Config | Precision | Recall | -Measure |
---|---|---|---|
RF_IMU | |||
LR_IMU | |||
RF_VIS_GT | |||
LR_VIS_GT | |||
RF_VIS_LEARN | |||
LR_VIS_LEARN | |||
RF_ALL_GT | |||
LR_ALL_GT | |||
RF_ALL_LEARN | |||
LR_ALL_LEARN |
Class | Precision | Recall | -Measure |
---|---|---|---|
none | |||
drink_water | |||
eat_banana | |||
eat_bread | |||
prepare_bread | |||
take_meds | |||
wipe_mouth |
Config | Precision | Recall | -Measure |
---|---|---|---|
RF_ALL | |||
LR_ALL | |||
RF_IMU | |||
LR_IMU | |||
RF_VIS | |||
LR_VIS |
Class | Precision | Recall | -Measure |
---|---|---|---|
close | |||
crack | |||
none | |||
open | |||
pour | |||
put | |||
read | |||
spray | |||
stir | |||
switch_on | |||
take | |||
twist_off | |||
twist_on | |||
walk |
Class | Baseline (*) | SSVM (*) | PR-SSVM (*) | Our Approach |
---|---|---|---|---|
close | ||||
crack | ||||
none | ||||
open | ||||
pour | ||||
put | ||||
read | ||||
spray | ||||
stir | ||||
switch_on | ||||
take | ||||
twist_off | ||||
twist_on | ||||
walk |
Config | Precision | Recall | -Measure |
---|---|---|---|
LR_EARLY | 0.430 | 0.326 | 0.337 |
LR_LATE | 0.378 | 0.323 | 0.329 |
RF_EARLY | 0.831 | 0.604 | 0.664 |
RF_LATE | 0.572 | 0.626 | 0.574 |
Class | Precision | Recall | -Measure |
---|---|---|---|
clean | |||
close | |||
fill | |||
open | |||
other | |||
put | |||
shake | |||
stir | |||
take | |||
turn_on | |||
walk |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Diete, A.; Stuckenschmidt, H. Fusing Object Information and Inertial Data for Activity Recognition. Sensors 2019, 19, 4119. https://doi.org/10.3390/s19194119
Diete A, Stuckenschmidt H. Fusing Object Information and Inertial Data for Activity Recognition. Sensors. 2019; 19(19):4119. https://doi.org/10.3390/s19194119
Chicago/Turabian StyleDiete, Alexander, and Heiner Stuckenschmidt. 2019. "Fusing Object Information and Inertial Data for Activity Recognition" Sensors 19, no. 19: 4119. https://doi.org/10.3390/s19194119
APA StyleDiete, A., & Stuckenschmidt, H. (2019). Fusing Object Information and Inertial Data for Activity Recognition. Sensors, 19(19), 4119. https://doi.org/10.3390/s19194119