Contrastive Feature Bin Loss for Monocular Depth Estimation

Recently monocular depth estimation has achieved notable performance using encoder-decoder-based models. These models have utilized the Scale-Invariant Logarithmic (SILog) loss for effective training, leading to significant performance improvements. However, since the SILog loss is designed to reduc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jihun Song, Yoonsuk Hyun
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Monocular depth estimation contrastive learning memory efficient training
Online Access:	https://ieeexplore.ieee.org/document/10926715/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849392858795081728
author	Jihun Song Yoonsuk Hyun
author_facet	Jihun Song Yoonsuk Hyun
author_sort	Jihun Song
collection	DOAJ
description	Recently monocular depth estimation has achieved notable performance using encoder-decoder-based models. These models have utilized the Scale-Invariant Logarithmic (SILog) loss for effective training, leading to significant performance improvements. However, since the SILog loss is designed to reduce error variance, it may potentially mislead the model. To address this problem, we propose the Contrastive Feature Bin (CFB) loss as an additional regularization loss. CFB loss prevents the possibility of incorrect learning by ensuring that similar depths are learned similarly, and can be easily integrated into various encoder-decoder-based models and greatly enhances overall performance. Another problem commonly faced by existing monocular depth estimation models is that they sometimes demand a significant amount of memory resources during training. Nevertheless, reducing memory consumption by employing smaller batch sizes can result in a noticeable decline in performance, compromising reproducibility and practicality. CFB loss allows encoder-decoder-based models to achieve comparable or even superior performance with lower batch sizes, requiring only modest increases in training time. Our proposed approach demonstrates improvements in the performance of diverse monocular depth estimation models on datasets such as NYU Depth v2 and KITTI Eigen split. Notably, in scenarios with a small batch size, it achieves up to an 11% improvement in RMSE compared to existing methods. The code is available at Github.
format	Article
id	doaj-art-04d13fcf9e9b43eaaac90bf937e3c9dd
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-04d13fcf9e9b43eaaac90bf937e3c9dd2025-08-20T03:40:40ZengIEEEIEEE Access2169-35362025-01-0113495844959610.1109/ACCESS.2025.355143510926715Contrastive Feature Bin Loss for Monocular Depth EstimationJihun Song0https://orcid.org/0009-0001-2139-8623Yoonsuk Hyun1https://orcid.org/0000-0001-5047-7139Department of Mathematics, Inha University, Incheon, South KoreaDepartment of Mathematics, Inha University, Incheon, South KoreaRecently monocular depth estimation has achieved notable performance using encoder-decoder-based models. These models have utilized the Scale-Invariant Logarithmic (SILog) loss for effective training, leading to significant performance improvements. However, since the SILog loss is designed to reduce error variance, it may potentially mislead the model. To address this problem, we propose the Contrastive Feature Bin (CFB) loss as an additional regularization loss. CFB loss prevents the possibility of incorrect learning by ensuring that similar depths are learned similarly, and can be easily integrated into various encoder-decoder-based models and greatly enhances overall performance. Another problem commonly faced by existing monocular depth estimation models is that they sometimes demand a significant amount of memory resources during training. Nevertheless, reducing memory consumption by employing smaller batch sizes can result in a noticeable decline in performance, compromising reproducibility and practicality. CFB loss allows encoder-decoder-based models to achieve comparable or even superior performance with lower batch sizes, requiring only modest increases in training time. Our proposed approach demonstrates improvements in the performance of diverse monocular depth estimation models on datasets such as NYU Depth v2 and KITTI Eigen split. Notably, in scenarios with a small batch size, it achieves up to an 11% improvement in RMSE compared to existing methods. The code is available at Github.https://ieeexplore.ieee.org/document/10926715/Monocular depth estimationcontrastive learningmemory efficient training
spellingShingle	Jihun Song Yoonsuk Hyun Contrastive Feature Bin Loss for Monocular Depth Estimation IEEE Access Monocular depth estimation contrastive learning memory efficient training
title	Contrastive Feature Bin Loss for Monocular Depth Estimation
title_full	Contrastive Feature Bin Loss for Monocular Depth Estimation
title_fullStr	Contrastive Feature Bin Loss for Monocular Depth Estimation
title_full_unstemmed	Contrastive Feature Bin Loss for Monocular Depth Estimation
title_short	Contrastive Feature Bin Loss for Monocular Depth Estimation
title_sort	contrastive feature bin loss for monocular depth estimation
topic	Monocular depth estimation contrastive learning memory efficient training
url	https://ieeexplore.ieee.org/document/10926715/
work_keys_str_mv	AT jihunsong contrastivefeaturebinlossformonoculardepthestimation AT yoonsukhyun contrastivefeaturebinlossformonoculardepthestimation

Contrastive Feature Bin Loss for Monocular Depth Estimation

Similar Items