research-article

Combining embedded accelerometers with computer vision for recognizing food preparation activities

Authors:

Sebastian Stein,

Stephen J. McKennaAuthors Info & Claims

UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing

Pages 729 - 738

https://doi.org/10.1145/2493432.2493482

Published: 08 September 2013 Publication History

Abstract

This paper introduces a publicly available dataset of complex activities that involve manipulative gestures. The dataset captures people preparing mixed salads and contains more than 4.5 hours of accelerometer and RGB-D video data, detailed annotations, and an evaluation protocol for comparison of activity recognition algorithms. Providing baseline results for one possible activity recognition task, this paper further investigates modality fusion methods at different stages of the recognition pipeline: (i) prior to feature extraction through accelerometer localization, (ii) at feature level via feature concatenation, and (iii) at classification level by combining classifier outputs. Empirical evaluation shows that fusing information captured by these sensor types can considerably improve recognition performance.

References

[1]

J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Computing Surveys, 43(3):16:1--16:43, 2011.

Digital Library

[2]

L. Chen, C. D. Nugent, J. Biswas, and J. Hoey, editors. Activity Recognition in Pervasive Intelligent Environments. Springer/Atlantis Press, 2011.

Digital Library

[3]

A. Criminisi, J. Shotton, and E. Konukoglu. Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Foundations and Trends in Computer Graphics and Vision, 7(2--3):81--227, 2012.

Digital Library

[4]

F. de la Torre, J. Hodgins, J. Montano, S. Valcarcel, R. Forcada, and J. Macey. Guide to the Carnegie Mellon University multimodal activity (Carnegie Mellon University-MMAC) database. 2009.

[5]

D. Figo, P. C. Diniz, and D. R. Ferreira. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing, 14(7):645--662, 2010.

Digital Library

[6]

G. Forman and M. Scholz. Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement. ACM SIGKDD Explorations, 12(1):49--57, 2010.

Digital Library

[7]

J. Hoey, T. Ploetz, D. Jackson, A. Monk, C. Pham, and P. Oliver. Rapid specification and automated generation of prompting systems to assist people with dementia. Pervasive and Mobile Computing, 7(3):299--318, 2010.

Digital Library

[8]

J. Hoey, P. Poupart, A. v. Bertoldi, T. Craig, C. Boutilier, and A. Mihailidis. Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5):503--519, 2010.

Digital Library

[9]

C.-H. Hsu and C.-H. Yu. An accelerometer based approach for indoor localization. In Proc. UIC-ATC, pages 223--227, 2009.

Digital Library

[10]

T. Huynh, M. Fritz, and B. Schiele. Discovery of activity patterns using topic models. In Proc. UbiComp, 2008.

Digital Library

[11]

J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:226--239, 1998.

Digital Library

[12]

R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proc. IJCAI, 1995.

Digital Library

[13]

I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2/3):107--123, 2005.

Digital Library

[14]

J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos "in the wild". In Proc. CVPR, 2009.

[15]

M. Marszałek, I. Laptev, and C. Schmid. Actions in context. In Proc. CVPR, 2009.

[16]

B. McFee and G. Lanckriet. Learning multi-modal similarity. Journal of Machine Learning Research, 12:491--523, 2011.

Digital Library

[17]

R. Messing, C. Pal, and H. Kautz. Activity recognition using the velocity histories of tracked keypoints. In Proc. ICCV, 2009.

[18]

C. Pham and P. Oliver. Slice&Dice: recognizing food preparation activities using embedded accelerometers. Ambient Intelligence, LNCS, 5859:34--43, 2009.

Digital Library

[19]

T. Plötz, N. Y. Hammerla, and P. Olivier. Feature learning for activity recognition in ubiquitous computing. In Proc. IJCAI, pages 1729--1734, 2012.

Digital Library

[20]

M. E. Pollack. Intelligent technology for an aging population: The use of AI to assist elders with cognitive impairment. AI Magazine, 26(2), 2005.

[21]

D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, J. Doppler, C. Holzmann, M. Kurz, G. Holl, R. Chavarriaga, M. Creatura, and J. del R. Millán. Collecting complex activity data sets in highly rich networked sensor environments. In Proc. INSS, 2010.

[22]

M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele. A database for fine grained activity detection of cooking activities. In Proc. CVPR, 2012.

Digital Library

[23]

C. Schuldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In Proc. ICPR, 2004.

Digital Library

[24]

S. Stein and S. J. McKenna. Accelerometer localization in the view of a stationary camera. In Proc. CRV, pages 109--116, 2012.

Digital Library

[25]

M. Tenorth, J. Bandouch, and M. Beetz. The TUM Kitchen Data Set of Everyday Manipulation Activities for Motion Tracking and Action Recognition. In Proc. ICCV, 2009.

[26]

H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In Proc CVPR, 2011.

Digital Library

[27]

J. P. Wherton and A. F. Monk. Problems people with dementia have with kitchen tasks: The challenge for pervasive computing. Interacting with Computers, 22(4):253--266, 2010.

Digital Library

[28]

J. Wu, A. Osuntogun, T. Choudhury, M. Philipose, and J. M. Rehg. A scalable approach to activity recognition based on object use. In Proc. ICCV, pages 1--8, 2007.

[29]

P. Zappi, C. Lombriser, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, and G. Tröster. Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. In Proc. EWSN, 2008.

Digital Library

Cited By

Liu YSun YChen ZFeng CZhu K(2025)Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed VideoTsinghua Science and Technology10.26599/TST.2024.901004130:1(290-302)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2024.9010041
Liu DLi QDinh AJiang TShah MXu C(2025)DiffAct++: Diffusion Action SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.350943447:3(1644-1659)Online publication date: Mar-2025
https://doi.org/10.1109/TPAMI.2024.3509434
Vybornova YAleshin MIllarionova SNovikov IShadrin DNikonorov ABurnaev E(2025)Self-Supervised Learning for Temporal Action Segmentation in Industrial and Manufacturing VideosIEEE Access10.1109/ACCESS.2025.354576813(39650-39665)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3545768
Show More Cited By

Index Terms

Combining embedded accelerometers with computer vision for recognizing food preparation activities
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
        Vision for robotics
  2. Machine learning
2. Social and professional topics

Recommendations

User-adaptive models for recognizing food preparation activities
CEA '13: Proceedings of the 5th international workshop on Multimedia for cooking & eating activities

Recognizing complex activities is a challenging research problem, particularly in the presence of strong variability in the way activities are performed. Food preparation activities are prime examples, involving many different utensils and ingredients ...
Recognising activities of daily living from patterns of object use

To provide assistance and support to the elderly disabled and cognitively impaired, the recognition of their activities of daily living (ADL) must be accurate and precise with regards to the object use for the activity situations. Current ...
HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer and Audio Features
MultiMedia Modeling
Abstract
In this paper, we present HTAD: A Home Tasks Activities Dataset. The dataset contains wrist-accelerometer and audio data from people performing at-home tasks such as sweeping, brushing teeth, washing hands, or watching TV. These activities ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing

September 2013

846 pages

ISBN:9781450317702

DOI:10.1145/2493432

General Chairs:
Friedemann Mattern
ETH Zurich, CH
,
Silvia Santini
TU Darmstadt, DE
,
Program Chairs:
John F. Canny
UC Berkeley, US
,
Marc Langheinrich
Università della Svizzera italiana, CH
,
Jun Rekimoto
University of Tokyo, JP

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing
University of Florida: University of Florida
SIGCHI: ACM Special Interest Group on Computer-Human Interaction

In-Cooperation

SIGSPATIAL: ACM Special Interest Group on Spatial Information

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UbiComp '13

Sponsor:

SIGMOBILE
University of Florida
SIGCHI

UbiComp '13: The 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing

September 8 - 12, 2013

Zurich, Switzerland

Acceptance Rates

UbiComp '13 Paper Acceptance Rate 92 of 394 submissions, 23%;

Overall Acceptance Rate 764 of 2,912 submissions, 26%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

292
Total Citations
View Citations
1,209
Total Downloads

Downloads (Last 12 months)164
Downloads (Last 6 weeks)18

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu YSun YChen ZFeng CZhu K(2025)Global Spatial-Temporal Information Encoder-Decoder Based Action Segmentation in Untrimmed VideoTsinghua Science and Technology10.26599/TST.2024.901004130:1(290-302)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2024.9010041
Liu DLi QDinh AJiang TShah MXu C(2025)DiffAct++: Diffusion Action SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.350943447:3(1644-1659)Online publication date: Mar-2025
https://doi.org/10.1109/TPAMI.2024.3509434
Vybornova YAleshin MIllarionova SNovikov IShadrin DNikonorov ABurnaev E(2025)Self-Supervised Learning for Temporal Action Segmentation in Industrial and Manufacturing VideosIEEE Access10.1109/ACCESS.2025.354576813(39650-39665)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3545768
Qiu SLuo YLuo QTang J(2025)SteadySeg: Improving maritime trajectory staging by steadiness recognitionOcean Engineering10.1016/j.oceaneng.2024.120136318(120136)Online publication date: Feb-2025
https://doi.org/10.1016/j.oceaneng.2024.120136
Romeo LMarani RPerri AGall J(2025)Multi-modal temporal action segmentation for manufacturing scenariosEngineering Applications of Artificial Intelligence10.1016/j.engappai.2025.110320148(110320)Online publication date: May-2025
https://doi.org/10.1016/j.engappai.2025.110320
Xu RSong ZWu JWang CZhou S(2025)Change-point detection with deep learning: A reviewFrontiers of Engineering Management10.1007/s42524-025-4109-zOnline publication date: 24-Jan-2025
https://doi.org/10.1007/s42524-025-4109-z
Qian LLi JWu YYe YFei HChua TZhuang YTang SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)MomentorProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693750(41340-41356)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693750
Wang TChen KZheng ZGuo JZhao XZhang S(2024)PrivShieldROS: An Extended Robot Operating System Integrating Ethereum and Interplanetary File System for Enhanced Sensor Data PrivacySensors10.3390/s2410324124:10(3241)Online publication date: 20-May-2024
https://doi.org/10.3390/s24103241
Shen ZXu RZhang YQin FGe RWang CToyoura M(2024)Boundary-Match U-Shaped Temporal Convolutional Network for Vulgar Action SegmentationMathematics10.3390/math1206089912:6(899)Online publication date: 18-Mar-2024
https://doi.org/10.3390/math12060899
Kaseris MKostavelis IMalassiotis S(2024)A Comprehensive Survey on Deep Learning Methods in Human Activity RecognitionMachine Learning and Knowledge Extraction10.3390/make60200406:2(842-876)Online publication date: 18-Apr-2024
https://doi.org/10.3390/make6020040
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten