Supervised Contrastive Learning for 3D Cross-Modal Retrieval

Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensi...

Full description

Saved in:
Bibliographic Details
Main Authors: Yeon-Seung Choo, Boeun Kim, Hyun-Sik Kim, Yong-Suk Park
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/22/10322
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846154572688523264
author Yeon-Seung Choo
Boeun Kim
Hyun-Sik Kim
Yong-Suk Park
author_facet Yeon-Seung Choo
Boeun Kim
Hyun-Sik Kim
Yong-Suk Park
author_sort Yeon-Seung Choo
collection DOAJ
description Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.
format Article
id doaj-art-d44c3a99558f455787b232d3b76ed9ef
institution Kabale University
issn 2076-3417
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-d44c3a99558f455787b232d3b76ed9ef2024-11-26T17:48:18ZengMDPI AGApplied Sciences2076-34172024-11-0114221032210.3390/app142210322Supervised Contrastive Learning for 3D Cross-Modal RetrievalYeon-Seung Choo0Boeun Kim1Hyun-Sik Kim2Yong-Suk Park3Contents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of KoreaArtificial Intelligence Research Center, Korea Electronics Technology Institute (KETI), Seongnam 13509, Republic of KoreaContents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of KoreaContents Convergence Research Center, Korea Electronics Technology Institute (KETI), Seoul 03924, Republic of KoreaInteroperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.https://www.mdpi.com/2076-3417/14/22/10322cross-modalobject retrievalcontrastive learning
spellingShingle Yeon-Seung Choo
Boeun Kim
Hyun-Sik Kim
Yong-Suk Park
Supervised Contrastive Learning for 3D Cross-Modal Retrieval
Applied Sciences
cross-modal
object retrieval
contrastive learning
title Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_full Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_fullStr Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_full_unstemmed Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_short Supervised Contrastive Learning for 3D Cross-Modal Retrieval
title_sort supervised contrastive learning for 3d cross modal retrieval
topic cross-modal
object retrieval
contrastive learning
url https://www.mdpi.com/2076-3417/14/22/10322
work_keys_str_mv AT yeonseungchoo supervisedcontrastivelearningfor3dcrossmodalretrieval
AT boeunkim supervisedcontrastivelearningfor3dcrossmodalretrieval
AT hyunsikkim supervisedcontrastivelearningfor3dcrossmodalretrieval
AT yongsukpark supervisedcontrastivelearningfor3dcrossmodalretrieval