Mapping between the Human Visual System and Two-stream DCNNs in Action Representation

2023

Creative Commons 'BY' version 4.0 license

Abstract

Deep convolutional neural networks (DCNNs) have been found to demonstrate hierarchical mapping to human brain regions on tasks such as object recognition. However, it remains unclear if such hierarchical mapping also applies to action recognition, which involves dynamic visual information processing. Here, we compared action representations of two-stream DCNNs to the human visual system. Five visual areas that are associated with object and action processing were selected. Nine human action categories were adopted from three semantic classes to examine the action representations of both DCNNs and human visual areas. In two fMRI experiments, actions were presented in the forms of computer-rendered videos and point-light biological motion videos. Results showed that although two-stream DCNNs demonstrated hierarchical representations of actions as layers grow deeper, DCNNs lack a hierarchical mapping to human visual areas. Consistently across different video displays and DCNN pathways, only the top DCNN layers demonstrated high similarity to representations in the human visual system. The results suggest that the dynamic representations of human actions may be different in DCNNs compared to the human visual system, even after big-data training.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

Mapping between the Human Visual System and Two-stream DCNNs in Action Representation