Mixed graph convolution and residual transformation network for skeleton-based action recognition
Department
Computer Science
Document Type
Article
Publication Date
1-1-2022
Abstract
Action recognition based on a human skeleton is an extremely challenging research problem. The temporal information contained in the human skeleton is more difficult to extract than the spatial information. Many researchers focus on graph convolution networks and apply them to action recognition. In this study, an action recognition method based on a two-stream network called RNXt-GCN is proposed on the basis of the Spatial-Temporal Graph Convolutional Network (ST-GCN). The human skeleton is converted first into a spatial-temporal graph and a SkeleMotion image which are input into ST-GCN and ResNeXt, respectively, for performing the spatial-temporal convolution. The convolved features are then fused. The proposed method models the temporal information in action from the amplitude and direction of the action and addresses the shortcomings of isolated temporal information in the ST-GCN. The experiments are comprehensively performed on the four datasets: 1) UTD-MHAD, 2) Northwestern-UCLA, 3) NTU RGB-D 60, and 4) NTU RGB-D 120. The proposed model shows very competitive results compared with other models in our experiments. On the experiments of NTU RGB + D 120 dataset, our proposed model outperforms those of the state-of-the-art two-stream models.
Journal Title
Applied Intelligence
Journal ISSN
0924669X
Volume
52
Issue
2
First Page
1544
Last Page
1555
Digital Object Identifier (DOI)
10.1007/s10489-021-02517-w