research-article

Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers

Authors:

Mani Golparvar-Fard,

Arsalan Heydarian,

Juan Carlos NieblesAuthors Info & Claims

Advanced Engineering Informatics, Volume 27, Issue 4

Pages 652 - 663

https://doi.org/10.1016/j.aei.2013.09.001

Published: 01 October 2013 Publication History

Abstract

We present a computer vision based method for equipment action recognition.Our vision-based method is based on a multiple binary SVM classifier and spatio-temporal features.A comprehensive real-world video dataset of excavator and truck actions is presented.We achieve accuracies of 86.33% and 98.33% for excavator and truck action classes.The presented method can be used for construction activity analysis using long sequences of videos. Video recordings of earthmoving construction operations provide understandable data that can be used for benchmarking and analyzing their performance. These recordings further support project managers to take corrective actions on performance deviations and in turn improve operational efficiency. Despite these benefits, manual stopwatch studies of previously recorded videos can be labor-intensive, may suffer from biases of the observers, and are impractical after substantial period of observations. This paper presents a new computer vision based algorithm for recognizing single actions of earthmoving construction equipment. This is particularly a challenging task as equipment can be partially occluded in site video streams and usually come in wide variety of sizes and appearances. The scale and pose of the equipment actions can also significantly vary based on the camera configurations. In the proposed method, a video is initially represented as a collection of spatio-temporal visual features by extracting space-time interest points and describing each feature with a Histogram of Oriented Gradients (HOG). The algorithm automatically learns the distributions of the spatio-temporal features and action categories using a multi-class Support Vector Machine (SVM) classifier. This strategy handles noisy feature points arisen from typical dynamic backgrounds. Given a video sequence captured from a fixed camera, the multi-class SVM classifier recognizes and localizes equipment actions. For the purpose of evaluation, a new video dataset is introduced which contains 859 sequences from excavator and truck actions. This dataset contains large variations of equipment pose and scale, and has varied backgrounds and levels of occlusion. The experimental results with average accuracies of 86.33% and 98.33% show that our supervised method outperforms previous algorithms for excavator and truck action recognition. The results hold the promise for applicability of the proposed method for construction activity analysis.

References

[1]

Gong, J. and Caldas, C.H., Computer vision-based video interpretation model for automated productivity analysis of construction operations. Journal of Computing in Civil Engineering. v24. 252-263.

Abstract

References

Cited By

Recommendations

Action matching network: open-set action recognition using spatio-temporal representation matching

Action recognition via spatio-temporal local features

Local velocity-adapted motion events for spatio-temporal recognition

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations