Nothing Special   »   [go: up one dir, main page]

Skip to main content

Attention Guided In-hand Mechanical Tools Recognition in Human-Robot Collaborative Process

  • Conference paper
  • First Online:
Interactive Collaborative Robotics (ICR 2023)

Abstract

The task of recognition of human behavior in a collaborative robotic system is crucial for the organization of seamless and productive collaboration. We design a vision system for the industrial scenario for riveting a metal plate and concentrate on the task of recognizing in-hand mechanical tools. However, there is a severe occlusion problem during hand-object interaction process. Incorporating attention modules into the backbone part are often utilized to handle occlusion and enhance the ability of extract features with contextual information. In view of that, three modified occlusion-aware models based on YOLOv5 for in-hand mechanical tools recognition are proposed: by adding SimAM into each of bottleneck network in the backbone part, inserting a Criss-Cross attention layer between the last C3 block and the SPPF block of the back-bone network, and replacing the last C3 block of the backbone network with Criss-Cross attention layer. We create a dataset specifically for our task of in-hand mechanical tools recognition and validate four modified models after training separately, which proves the effectiveness of SimAM module and ineffectiveness of Criss-Cross attention module. The real-time detection is still imperfect under the occlusion of various directions of the hands.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Serebrenny, V., Lapin, D., Mokaeva, A.: The concept of an aircraft hull structures assembly process robotization. In: AIP Conference Proceedings, vol. 2171, no. 1 (2019)

    Google Scholar 

  2. Ranz, F., Hummel, V., Sihn, W.: Capability-based task allocation in human-robot collaboration. Proc. Manuf. 9, 182–189 (2017)

    Google Scholar 

  3. Tsarouchi, P., Michalos, G., Makris, S., Athanasatos, T., Dimoulas, K., Chryssolouris, G.: On a human-robot workplace design and task allocation system. Int. J. Comput. Integr. Manuf. 30(12), 1272–1279 (2017)

    Article  Google Scholar 

  4. Mazhar, O., Navarro, B., Ramdani, S., Passama, R., Cherubini, A.: A real-time human-robot interaction framework with robust background invariant hand gesture detection. Robot. Comput.-Integr. Manuf. 60, 34–48 (2019)

    Article  Google Scholar 

  5. Qi, W., Ovur, S.E., Li, Z., Marzullo, A., Song, R.: Multi-sensor guided hand gesture recognition for a teleoperated robot using a recurrent neural network. IEEE Robot. Autom. Lett. 6(3), 6039–6045 (2021)

    Article  Google Scholar 

  6. Fan, H., Zhuo, T., Yu, X., Yang, Y., Kankanhalli, M.: Understanding atomic hand-object interaction with human intention. IEEE Trans. Circ. Syst. Video Technol. 32(1), 275–285 (2021)

    Article  Google Scholar 

  7. Damen, D., et al.: Scaling egocentric vision: the epic-kitchens dataset. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 720–736 (2018)

    Google Scholar 

  8. Lomonaco, V., Maltoni, D.: CORe50: a new dataset and benchmark for continuous object recognition. In: Conference on Robot Learning, pp. 17–26 (2017)

    Google Scholar 

  9. Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The MECCANO dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1569–1578 (2021)

    Google Scholar 

  10. Nguyen, K., Todorovic, S.: A weakly supervised amodal segmenter with boundary uncertainty estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7396–7405 (2021)

    Google Scholar 

  11. Hasson, Y., Varol, G., Schmid, C., Laptev, I.: Towards unconstrained joint hand-object reconstruction from RGB videos. In: 2021 International Conference on 3D Vision (3DV), pp. 659–668 (2021)

    Google Scholar 

  12. Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015)

    Google Scholar 

  13. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  14. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

    Google Scholar 

  15. Bochkovskiy, A., Wang, C., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  16. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

    Google Scholar 

  17. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)

    Google Scholar 

  18. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

  19. WorkingHands Dataset. https://www3.cs.stonybrook.edu/~minhhoai/downloads.html. Accessed 01 June 2023

  20. Saleh, K., Szénási, S., Vámossy, Z.: Occlusion handling in generic object detection: a review. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 477–484 (2021)

    Google Scholar 

  21. Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 13001–13008 (2020)

    Google Scholar 

  22. Zhou, C., Yuan, J.: Occlusion pattern discovery for object detection and occlusion reasoning. IEEE Trans. Circ. Syst. Video Technol. 30(7), 2067–2080 (2019)

    Google Scholar 

  23. Wang, X., Shrivastava, A., Gupta, A.: A-Fast-RCNN: hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2606–2615 (2017)

    Google Scholar 

  24. Ehsani, K., Mottaghi, R., Farhadi, A.: SeGAN: segmenting and generating the invisible. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6144–6153 (2018)

    Google Scholar 

  25. Tang, Y., Wang, Z., Lu, J., Feng, J., Zhou, J.: Multi-stream deep neural networks for RGB-D ego-centric action recognition. IEEE Trans. Circ. Syst. Video Technol. 29(10), 3001–3015 (2018)

    Article  Google Scholar 

  26. Lee, K., Kacorri, H.: Hands holding clues for object recognition in teachable machines. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2019)

    Google Scholar 

  27. Dosovitskiy, A., et al.: An image is worth 16 × 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  28. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: Criss-Cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019)

    Google Scholar 

  29. Yang, L., Zhang, R.Y., Li, L., Xie, X.: SimAM: a simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp. 11863–11874 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guo Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, G., Shen, X., Serebrenny, V. (2023). Attention Guided In-hand Mechanical Tools Recognition in Human-Robot Collaborative Process. In: Ronzhin, A., Sadigov, A., Meshcheryakov, R. (eds) Interactive Collaborative Robotics. ICR 2023. Lecture Notes in Computer Science(), vol 14214. Springer, Cham. https://doi.org/10.1007/978-3-031-43111-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43111-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43110-4

  • Online ISBN: 978-3-031-43111-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics