Supervised Contrastive Learning for 3D Cross-Modal Retrieval

Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yeon-Seung Choo, Boeun Kim, Hyun-Sik Kim, Yong-Suk Park
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Applied Sciences
Subjects:	cross-modal object retrieval contrastive learning
Online Access:	https://www.mdpi.com/2076-3417/14/22/10322
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846154572688523264
author	Yeon-Seung Choo Boeun Kim Hyun-Sik Kim Yong-Suk Park
author_facet	Yeon-Seung Choo Boeun Kim Hyun-Sik Kim Yong-Suk Park
author_sort	Yeon-Seung Choo
collection	DOAJ
description	Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.
format	Article
id	doaj-art-d44c3a99558f455787b232d3b76ed9ef
institution	Kabale University
issn	2076-3417
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-d44c3a99558f455787b232d3b76ed9ef2024-11-26T17:48:18ZengMDPI AGApplied Sciences2076-34172024-11-0114221032210.3390/app142210322Supervised Contrastive Learning for 3D Cross-Modal RetrievalYeon-Seung Choo0Boeun Kim1Hyun-Sik Kim2Yong-Suk Park3Contents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of KoreaArtificial Intelligence Research Center, Korea Electronics Technology Institute (KETI), Seongnam 13509, Republic of KoreaContents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of KoreaContents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of KoreaInteroperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.https://www.mdpi.com/2076-3417/14/22/10322cross-modalobject retrievalcontrastive learning
spellingShingle	Yeon-Seung Choo Boeun Kim Hyun-Sik Kim Yong-Suk Park Supervised Contrastive Learning for 3D Cross-Modal Retrieval Applied Sciences cross-modal object retrieval contrastive learning
title	Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_full	Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_fullStr	Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_full_unstemmed	Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_short	Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_sort	supervised contrastive learning for 3d cross modal retrieval
topic	cross-modal object retrieval contrastive learning
url	https://www.mdpi.com/2076-3417/14/22/10322
work_keys_str_mv	AT yeonseungchoo supervisedcontrastivelearningfor3dcrossmodalretrieval AT boeunkim supervisedcontrastivelearningfor3dcrossmodalretrieval AT hyunsikkim supervisedcontrastivelearningfor3dcrossmodalretrieval AT yongsukpark supervisedcontrastivelearningfor3dcrossmodalretrieval

Supervised Contrastive Learning for 3D Cross-Modal Retrieval

Similar Items