A Spatial Transformation Based Next Frame Predictor
In recent years, the automobile industry has achieved astonishing success in making autonomous cars safer, more affordable, and more reliable. However, current autonomous driving technology is mainly based on reactive controllers that attempt to respond to the various events the car encounters. Yet,...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10823095/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In recent years, the automobile industry has achieved astonishing success in making autonomous cars safer, more affordable, and more reliable. However, current autonomous driving technology is mainly based on reactive controllers that attempt to respond to the various events the car encounters. Yet, achieving a truly safe and reliable autonomous system necessitates anticipating such events and planning the correct actions in advance to avoid undesirable behavior. Recent advances in deep learning have shown remarkable performance in predicting future frames from video sequences. However, most of these approaches can only handle a few moving elements in the scene and perform poorly when the camera is in motion. This is mainly due to the difficulty of disentangling camera intrinsic motion from object-dependent motion. In this work, we equip autonomous cars with an object-oriented next-frame predictor that leverages Transformer architecture to extract, for each moving object in the scene, a spatial transformation applied to the object to predict its configuration in the next frame. Static elements of the scene are then used to estimate camera intrinsic motion, which is applied to the background to predict how it will be viewed in the next frame. Notably, our approach significantly reduces the complexity typically associated with such models by requiring the estimation of only 14 parameters per moving object, independent of image resolution. We have validated the generalization capabilities of our model through training on simulated datasets and testing on real-world datasets. The results indicate that our model not only outperforms existing models trained solely on real data but also exhibits superior resilience to occlusions and incomplete data in the input sequences. These findings underscore the potential of our model to significantly improve the predictive analytics capabilities of autonomous driving systems, thereby enhancing their safety and reliability in dynamic environments. |
---|---|
ISSN: | 2169-3536 |