Video description method based on multidimensional and multimodal information

In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were ex...

Full description

Saved in:
Bibliographic Details
Main Authors: Enjie DING, Zhongyu LIU, Yafeng LIU, Wanli YU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2020-02-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841539325352738816
author Enjie DING
Zhongyu LIU
Yafeng LIU
Wanli YU
author_facet Enjie DING
Zhongyu LIU
Yafeng LIU
Wanli YU
author_sort Enjie DING
collection DOAJ
description In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task.
format Article
id doaj-art-0f3311aef9ed4275907447e043ef3509
institution Kabale University
issn 1000-436X
language zho
publishDate 2020-02-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-0f3311aef9ed4275907447e043ef35092025-01-14T07:18:32ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2020-02-0141364359732961Video description method based on multidimensional and multimodal informationEnjie DINGZhongyu LIUYafeng LIUWanli YUIn order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/video descriptionmultimodaltransfer learninglong and short term memory network
spellingShingle Enjie DING
Zhongyu LIU
Yafeng LIU
Wanli YU
Video description method based on multidimensional and multimodal information
Tongxin xuebao
video description
multimodal
transfer learning
long and short term memory network
title Video description method based on multidimensional and multimodal information
title_full Video description method based on multidimensional and multimodal information
title_fullStr Video description method based on multidimensional and multimodal information
title_full_unstemmed Video description method based on multidimensional and multimodal information
title_short Video description method based on multidimensional and multimodal information
title_sort video description method based on multidimensional and multimodal information
topic video description
multimodal
transfer learning
long and short term memory network
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/
work_keys_str_mv AT enjieding videodescriptionmethodbasedonmultidimensionalandmultimodalinformation
AT zhongyuliu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation
AT yafengliu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation
AT wanliyu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation