2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI)
Download PDF

Abstract

In response to the ever-increasing volume of video data, this research introduces a deep learning model for annotating behavioral image data. Utilizing RGB-D multimodal cameras and deep learning techniques, the proposed model integrates skeleton sequences and video sequences to improve action recognition. The study explores global information modeling and action moving image annotation models to efficiently extract and encode key features of human actions. In the proposed model, keyframe extraction is crucial and Multi-Head Attention (MHA) is used for efficiency. MHA involves linear mapping of questions, keys, and values to different subspaces by computing weighted sums for context vectors. The video annotation algorithm includes data preprocessing, feature extraction, and attention mechanism. Training includes double-loop traversal, word vector replacement, and model evaluation. MHA improves the relationship between videos and sentences. Experimental results, conducted on the NTU RGB+ dataset, demonstrate the superiority of the model with an accuracy of over 99%. This innovative approach contributes new tools for automatic annotation of action image data, with the potential applications in the intelligent surveillance, human-computer interaction, and autonomous driving.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles