Video description method based on multidimensional and multimodal information

In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were ex...

Full description

Saved in:

Bibliographic Details
Main Authors:	Enjie DING, Zhongyu LIU, Yafeng LIU, Wanli YU
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2020-02-01
Series:	Tongxin xuebao
Subjects:	video description multimodal transfer learning long and short term memory network
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539325352738816
author	Enjie DING Zhongyu LIU Yafeng LIU Wanli YU
author_facet	Enjie DING Zhongyu LIU Yafeng LIU Wanli YU
author_sort	Enjie DING
collection	DOAJ
description	In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task.
format	Article
id	doaj-art-0f3311aef9ed4275907447e043ef3509
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2020-02-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-0f3311aef9ed4275907447e043ef35092025-01-14T07:18:32ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2020-02-0141364359732961Video description method based on multidimensional and multimodal informationEnjie DINGZhongyu LIUYafeng LIUWanli YUIn order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/video descriptionmultimodaltransfer learninglong and short term memory network
spellingShingle	Enjie DING Zhongyu LIU Yafeng LIU Wanli YU Video description method based on multidimensional and multimodal information Tongxin xuebao video description multimodal transfer learning long and short term memory network
title	Video description method based on multidimensional and multimodal information
title_full	Video description method based on multidimensional and multimodal information
title_fullStr	Video description method based on multidimensional and multimodal information
title_full_unstemmed	Video description method based on multidimensional and multimodal information
title_short	Video description method based on multidimensional and multimodal information
title_sort	video description method based on multidimensional and multimodal information
topic	video description multimodal transfer learning long and short term memory network
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/
work_keys_str_mv	AT enjieding videodescriptionmethodbasedonmultidimensionalandmultimodalinformation AT zhongyuliu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation AT yafengliu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation AT wanliyu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation

Video description method based on multidimensional and multimodal information

Similar Items