TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology
This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/15/4614 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the <i>Semantic Modeling Framework</i>, which represents environments using layered structures of places, objects, and their ontological relations. |
|---|---|
| ISSN: | 1424-8220 |