Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2493432.2493482acmconferencesArticle/Chapter ViewAbstractPublication PagesubicompConference Proceedingsconference-collections
research-article

Combining embedded accelerometers with computer vision for recognizing food preparation activities

Published: 08 September 2013 Publication History

Abstract

This paper introduces a publicly available dataset of complex activities that involve manipulative gestures. The dataset captures people preparing mixed salads and contains more than 4.5 hours of accelerometer and RGB-D video data, detailed annotations, and an evaluation protocol for comparison of activity recognition algorithms. Providing baseline results for one possible activity recognition task, this paper further investigates modality fusion methods at different stages of the recognition pipeline: (i) prior to feature extraction through accelerometer localization, (ii) at feature level via feature concatenation, and (iii) at classification level by combining classifier outputs. Empirical evaluation shows that fusing information captured by these sensor types can considerably improve recognition performance.

References

[1]
J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43(3):16:1--16:43, 2011.
[2]
L. Chen, C. D. Nugent, J. Biswas, and J. Hoey, editors. Activity Recognition in Pervasive Intelligent Environments. Springer/Atlantis Press, 2011.
[3]
A. Criminisi, J. Shotton, and E. Konukoglu. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2--3):81--227, 2012.
[4]
F. de la Torre, J. Hodgins, J. Montano, S. Valcarcel, R. Forcada, and J. Macey. Guide to the Carnegie Mellon University multimodal activity (Carnegie Mellon University-MMAC) database. 2009.
[5]
D. Figo, P. C. Diniz, and D. R. Ferreira. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing, 14(7):645--662, 2010.
[6]
G. Forman and M. Scholz. Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement. ACM SIGKDD Explorations, 12(1):49--57, 2010.
[7]
J. Hoey, T. Ploetz, D. Jackson, A. Monk, C. Pham, and P. Oliver. Rapid specification and automated generation of prompting systems to assist people with dementia. Pervasive and Mobile Computing, 7(3):299--318, 2010.
[8]
J. Hoey, P. Poupart, A. v. Bertoldi, T. Craig, C. Boutilier, and A. Mihailidis. Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5):503--519, 2010.
[9]
C.-H. Hsu and C.-H. Yu. An accelerometer based approach for indoor localization. In Proc. UIC-ATC, pages 223--227, 2009.
[10]
T. Huynh, M. Fritz, and B. Schiele. Discovery of activity patterns using topic models. In Proc. UbiComp, 2008.
[11]
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:226--239, 1998.
[12]
R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proc. IJCAI, 1995.
[13]
I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2/3):107--123, 2005.
[14]
J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos "in the wild". In Proc. CVPR, 2009.
[15]
M. Marszałek, I. Laptev, and C. Schmid. Actions in context. In Proc. CVPR, 2009.
[16]
B. McFee and G. Lanckriet. Learning multi-modal similarity. Journal of Machine Learning Research, 12:491--523, 2011.
[17]
R. Messing, C. Pal, and H. Kautz. Activity recognition using the velocity histories of tracked keypoints. In Proc. ICCV, 2009.
[18]
C. Pham and P. Oliver. Slice&Dice: recognizing food preparation activities using embedded accelerometers. Ambient Intelligence, LNCS, 5859:34--43, 2009.
[19]
T. Plötz, N. Y. Hammerla, and P. Olivier. Feature learning for activity recognition in ubiquitous computing. In Proc. IJCAI, pages 1729--1734, 2012.
[20]
M. E. Pollack. Intelligent technology for an aging population: The use of AI to assist elders with cognitive impairment. AI Magazine, 26(2), 2005.
[21]
D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, J. Doppler, C. Holzmann, M. Kurz, G. Holl, R. Chavarriaga, M. Creatura, and J. del R. Millán. Collecting complex activity data sets in highly rich networked sensor environments. In Proc. INSS, 2010.
[22]
M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele. A database for fine grained activity detection of cooking activities. In Proc. CVPR, 2012.
[23]
C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In Proc. ICPR, 2004.
[24]
S. Stein and S. J. McKenna. Accelerometer localization in the view of a stationary camera. In Proc. CRV, pages 109--116, 2012.
[25]
M. Tenorth, J. Bandouch, and M. Beetz. The TUM Kitchen Data Set of Everyday Manipulation Activities for Motion Tracking and Action Recognition. In Proc. ICCV, 2009.
[26]
H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In Proc CVPR, 2011.
[27]
J. P. Wherton and A. F. Monk. Problems people with dementia have with kitchen tasks: The challenge for pervasive computing. Interacting with Computers, 22(4):253--266, 2010.
[28]
J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. M. Rehg. A scalable approach to activity recognition based on object use. In Proc. ICCV, pages 1--8, 2007.
[29]
P. Zappi, C. Lombriser, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, and G. Tröster. Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. In Proc. EWSN, 2008.

Cited By

View all
  • (2025)Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed VideoTsinghua Science and Technology10.26599/TST.2024.901004130:1(290-302)Online publication date: Feb-2025
  • (2024)PrivShieldROS: An Extended Robot Operating System Integrating Ethereum and Interplanetary File System for Enhanced Sensor Data PrivacySensors10.3390/s2410324124:10(3241)Online publication date: 20-May-2024
  • (2024)Boundary-Match U-Shaped Temporal Convolutional Network for Vulgar Action SegmentationMathematics10.3390/math1206089912:6(899)Online publication date: 18-Mar-2024
  • Show More Cited By

Index Terms

  1. Combining embedded accelerometers with computer vision for recognizing food preparation activities

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing
          September 2013
          846 pages
          ISBN:9781450317702
          DOI:10.1145/2493432
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          In-Cooperation

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 08 September 2013

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. accelerometers
          2. activity recognition
          3. computer vision
          4. multi-modal dataset
          5. sensor fusion

          Qualifiers

          • Research-article

          Conference

          UbiComp '13
          Sponsor:

          Acceptance Rates

          UbiComp '13 Paper Acceptance Rate 92 of 394 submissions, 23%;
          Overall Acceptance Rate 764 of 2,912 submissions, 26%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)171
          • Downloads (Last 6 weeks)16
          Reflects downloads up to 18 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed VideoTsinghua Science and Technology10.26599/TST.2024.901004130:1(290-302)Online publication date: Feb-2025
          • (2024)PrivShieldROS: An Extended Robot Operating System Integrating Ethereum and Interplanetary File System for Enhanced Sensor Data PrivacySensors10.3390/s2410324124:10(3241)Online publication date: 20-May-2024
          • (2024)Boundary-Match U-Shaped Temporal Convolutional Network for Vulgar Action SegmentationMathematics10.3390/math1206089912:6(899)Online publication date: 18-Mar-2024
          • (2024)A Comprehensive Survey on Deep Learning Methods in Human Activity RecognitionMachine Learning and Knowledge Extraction10.3390/make60200406:2(842-876)Online publication date: 18-Apr-2024
          • (2024)A Survey on Video Diffusion ModelsACM Computing Surveys10.1145/369641557:2(1-42)Online publication date: 18-Sep-2024
          • (2024)3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation ApproachProceedings of the 7th ACM International Workshop on Multimedia Content Analysis in Sports10.1145/3689061.3689077(17-26)Online publication date: 28-Oct-2024
          • (2024)From Recognition to Prediction: Leveraging Sequence Reasoning for Action AnticipationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368747420:11(1-19)Online publication date: 28-Aug-2024
          • (2024)Brief Introduction of the OpenPack Dataset and Lessons Learned from Organizing Activity Recognition Challenge Using the DatasetCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3677597(116-120)Online publication date: 5-Oct-2024
          • (2024)Toward Long Form Audio-Visual Video UnderstandingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367207920:9(1-26)Online publication date: 7-Jun-2024
          • (2024)CookingINWild: Unleashing the Challenges of Indian Cuisine Cooking Videos for Action RecognitionProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632429(222-226)Online publication date: 4-Jan-2024
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media