TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jun-Hyeon Choi, Jeong-Won Pyo, Ye-Chan An, Tae-Yong Kuc
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Sensors
Subjects:	hierarchical descriptor visual representation scene understanding object pooling feature aggregation
Online Access:	https://www.mdpi.com/1424-8220/25/15/4614
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the <i>Semantic Modeling Framework</i>, which represents environments using layered structures of places, objects, and their ontological relations.
ISSN:	1424-8220

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology

Similar Items