Computational models reveal that intuitive physics underlies visual processing of soft objects

Abstract Computational explorations of human cognition have been especially successful when applied to visual perception. Existing models have primarily focused on rigid objects, emphasizing shape-preserving invariance to changes in viewpoint, lighting, object size, and scene context. Yet many objec...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenyan Bi, Aalap D. Shah, Kimberly W. Wong, Brian J. Scholl, Ilker Yildirim
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-61458-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Computational explorations of human cognition have been especially successful when applied to visual perception. Existing models have primarily focused on rigid objects, emphasizing shape-preserving invariance to changes in viewpoint, lighting, object size, and scene context. Yet many objects in our everyday environments, such as cloths, are soft. This poses both quantitatively greater and qualitatively different challenges for models of perception, due to soft objects’ dynamic and high-dimensional internal structure, as in the changing folds and wrinkles of a cloth waving in the wind. Soft object perception is also correspondingly rich, involving distinct properties such as stiffness. Here we explore the ability of different kinds of computational models to capture visual perception of the physical properties of cloths (e.g., their degrees of stiffness) undergoing different naturalistic transformations (e.g., falling vs. waving in the wind). Across visual matching tasks, both the successes and failures of human performance are well explained by Woven: a new model that incorporates physics-based simulations to infer probabilistic representations of cloths. Woven outperforms powerful, performance-equated alternatives, including its ablations and a deep neural network, and suggests that humanlike machine vision may also require representations that transcend image statistics, and involve intuitive physics.
ISSN:2041-1723