Attention Guided In-hand Mechanical Tools Recognition in Human-Robot Collaborative Process

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14214))

Included in the following conference series:

International Conference on Interactive Collaborative Robotics

502 Accesses
1 Citations

Abstract

The task of recognition of human behavior in a collaborative robotic system is crucial for the organization of seamless and productive collaboration. We design a vision system for the industrial scenario for riveting a metal plate and concentrate on the task of recognizing in-hand mechanical tools. However, there is a severe occlusion problem during hand-object interaction process. Incorporating attention modules into the backbone part are often utilized to handle occlusion and enhance the ability of extract features with contextual information. In view of that, three modified occlusion-aware models based on YOLOv5 for in-hand mechanical tools recognition are proposed: by adding SimAM into each of bottleneck network in the backbone part, inserting a Criss-Cross attention layer between the last C3 block and the SPPF block of the back-bone network, and replacing the last C3 block of the backbone network with Criss-Cross attention layer. We create a dataset specifically for our task of in-hand mechanical tools recognition and validate four modified models after training separately, which proves the effectiveness of SimAM module and ineffectiveness of Criss-Cross attention module. The real-time detection is still imperfect under the occlusion of various directions of the hands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection

References

Serebrenny, V., Lapin, D., Mokaeva, A.: The concept of an aircraft hull structures assembly process robotization. In: AIP Conference Proceedings, vol. 2171, no. 1 (2019)
Google Scholar
Ranz, F., Hummel, V., Sihn, W.: Capability-based task allocation in human-robot collaboration. Proc. Manuf. 9, 182–189 (2017)
Google Scholar
Tsarouchi, P., Michalos, G., Makris, S., Athanasatos, T., Dimoulas, K., Chryssolouris, G.: On a human-robot workplace design and task allocation system. Int. J. Comput. Integr. Manuf. 30(12), 1272–1279 (2017)
Article Google Scholar
Mazhar, O., Navarro, B., Ramdani, S., Passama, R., Cherubini, A.: A real-time human-robot interaction framework with robust background invariant hand gesture detection. Robot. Comput.-Integr. Manuf. 60, 34–48 (2019)
Article Google Scholar
Qi, W., Ovur, S.E., Li, Z., Marzullo, A., Song, R.: Multi-sensor guided hand gesture recognition for a teleoperated robot using a recurrent neural network. IEEE Robot. Autom. Lett. 6(3), 6039–6045 (2021)
Article Google Scholar
Fan, H., Zhuo, T., Yu, X., Yang, Y., Kankanhalli, M.: Understanding atomic hand-object interaction with human intention. IEEE Trans. Circ. Syst. Video Technol. 32(1), 275–285 (2021)
Article Google Scholar
Damen, D., et al.: Scaling egocentric vision: the epic-kitchens dataset. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 720–736 (2018)
Google Scholar
Lomonaco, V., Maltoni, D.: CORe50: a new dataset and benchmark for continuous object recognition. In: Conference on Robot Learning, pp. 17–26 (2017)
Google Scholar
Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The MECCANO dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1569–1578 (2021)
Google Scholar
Nguyen, K., Todorovic, S.: A weakly supervised amodal segmenter with boundary uncertainty estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7396–7405 (2021)
Google Scholar
Hasson, Y., Varol, G., Schmid, C., Laptev, I.: Towards unconstrained joint hand-object reconstruction from RGB videos. In: 2021 International Conference on 3D Vision (3DV), pp. 659–668 (2021)
Google Scholar
Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(6), 1137–1149 (2017)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Google Scholar
Bochkovskiy, A., Wang, C., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
WorkingHands Dataset. https://www3.cs.stonybrook.edu/~minhhoai/downloads.html. Accessed 01 June 2023
Saleh, K., Szénási, S., Vámossy, Z.: Occlusion handling in generic object detection: a review. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 477–484 (2021)
Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 13001–13008 (2020)
Google Scholar
Zhou, C., Yuan, J.: Occlusion pattern discovery for object detection and occlusion reasoning. IEEE Trans. Circ. Syst. Video Technol. 30(7), 2067–2080 (2019)
Google Scholar
Wang, X., Shrivastava, A., Gupta, A.: A-Fast-RCNN: hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2606–2615 (2017)
Google Scholar
Ehsani, K., Mottaghi, R., Farhadi, A.: SeGAN: segmenting and generating the invisible. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6144–6153 (2018)
Google Scholar
Tang, Y., Wang, Z., Lu, J., Feng, J., Zhou, J.: Multi-stream deep neural networks for RGB-D ego-centric action recognition. IEEE Trans. Circ. Syst. Video Technol. 29(10), 3001–3015 (2018)
Article Google Scholar
Lee, K., Kacorri, H.: Hands holding clues for object recognition in teachable machines. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: Criss-Cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019)
Google Scholar
Yang, L., Zhang, R.Y., Li, L., Xie, X.: SimAM: a simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp. 11863–11874 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Bauman Moscow State Technical University, Moscow, 105005, Russia
Guo Wu, Xin Shen & Vladimir Serebrenny

Authors

Guo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Serebrenny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guo Wu .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Andrey Ronzhin
Institute of Control Systems of Ministry of Science and Education of the Republic of Azerbaijan, Baku, Azerbaijan
Aminagha Sadigov
V.A. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia
Roman Meshcheryakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, G., Shen, X., Serebrenny, V. (2023). Attention Guided In-hand Mechanical Tools Recognition in Human-Robot Collaborative Process. In: Ronzhin, A., Sadigov, A., Meshcheryakov, R. (eds) Interactive Collaborative Robotics. ICR 2023. Lecture Notes in Computer Science(), vol 14214. Springer, Cham. https://doi.org/10.1007/978-3-031-43111-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-43111-1_1
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43110-4
Online ISBN: 978-3-031-43111-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Attention Guided In-hand Mechanical Tools Recognition in Human-Robot Collaborative Process

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Attention Guided In-hand Mechanical Tools Recognition in Human-Robot Collaborative Process

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation