Video description method based on multidimensional and multimodal information
In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were ex...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2020-02-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841539325352738816 |
---|---|
author | Enjie DING Zhongyu LIU Yafeng LIU Wanli YU |
author_facet | Enjie DING Zhongyu LIU Yafeng LIU Wanli YU |
author_sort | Enjie DING |
collection | DOAJ |
description | In order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task. |
format | Article |
id | doaj-art-0f3311aef9ed4275907447e043ef3509 |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2020-02-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-0f3311aef9ed4275907447e043ef35092025-01-14T07:18:32ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2020-02-0141364359732961Video description method based on multidimensional and multimodal informationEnjie DINGZhongyu LIUYafeng LIUWanli YUIn order to solve the problem of complex information representation in automatic video description tasks,a multi-dimensional and multi-modal visual feature extraction and fusion method was proposed.Firstly,multi-dimensional features such as static and dynamic attributes of the video sequence were extracted by transfer learning,and the image description algorithm was also used to extract the semantic information of the key frames in the video.By doing this,the video features extraction was carried out.Then,multi-layer long and short memory networks were used to fuse multi-dimensional and multi-modal information,and finally generated a language description of the video content.Compared with the existing methods,experimental simulations results show that the proposed method achieves better results in the video automatic description task.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/video descriptionmultimodaltransfer learninglong and short term memory network |
spellingShingle | Enjie DING Zhongyu LIU Yafeng LIU Wanli YU Video description method based on multidimensional and multimodal information Tongxin xuebao video description multimodal transfer learning long and short term memory network |
title | Video description method based on multidimensional and multimodal information |
title_full | Video description method based on multidimensional and multimodal information |
title_fullStr | Video description method based on multidimensional and multimodal information |
title_full_unstemmed | Video description method based on multidimensional and multimodal information |
title_short | Video description method based on multidimensional and multimodal information |
title_sort | video description method based on multidimensional and multimodal information |
topic | video description multimodal transfer learning long and short term memory network |
url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2020037/ |
work_keys_str_mv | AT enjieding videodescriptionmethodbasedonmultidimensionalandmultimodalinformation AT zhongyuliu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation AT yafengliu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation AT wanliyu videodescriptionmethodbasedonmultidimensionalandmultimodalinformation |