The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection

Assisted and automated driving functions will rely on machine learning algorithms, given their ability to cope with real-world variations, e.g. vehicles of different shapes, positions, colors, and so forth. Supervised learning needs annotated datasets, and several automotive datasets are available....

Full description

Saved in:

Bibliographic Details
Main Authors:	Pak Hung Chan, Boda Li, Gabriele Baris, Qasim Sadiq, Valentina Donzella
Format:	Article
Language:	English
Published:	Cambridge University Press 2024-01-01
Series:	Data-Centric Engineering
Subjects:	machine learning automated vehicles automotive dataset labeling
Online Access:	https://www.cambridge.org/core/product/identifier/S263267362400039X/type/journal_article
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846157687642914816
author	Pak Hung Chan Boda Li Gabriele Baris Qasim Sadiq Valentina Donzella
author_facet	Pak Hung Chan Boda Li Gabriele Baris Qasim Sadiq Valentina Donzella
author_sort	Pak Hung Chan
collection	DOAJ
description	Assisted and automated driving functions will rely on machine learning algorithms, given their ability to cope with real-world variations, e.g. vehicles of different shapes, positions, colors, and so forth. Supervised learning needs annotated datasets, and several automotive datasets are available. However, these datasets are tremendous in volume, and labeling accuracy and quality can vary across different datasets and within dataset frames. Accurate and appropriate ground truth is especially important for automotive, as “incomplete” or “incorrect” learning can negatively impact vehicle safety when these neural networks are deployed. This work investigates the ground truth quality of widely adopted automotive datasets, including a detailed analysis of KITTI MoSeg. According to the identified and classified errors in the annotations of different automotive datasets, this article provides three different criteria collections for producing improved annotations. These criteria are enforceable and applicable to a wide variety of datasets. The three annotations sets are created to (i) remove dubious cases; (ii) annotate to the best of human visual system; and (iii) remove clear erroneous BBs. KITTI MoSeg has been reannotated three times according to the specified criteria, and three state-of-the-art deep neural network object detectors are used to evaluate them. The results clearly show that network performance is affected by ground truth variations, and removing clear errors is beneficial for predicting real-world objects only for some networks. The relabeled datasets still present some cases with “arbitrary”/“controversial” annotations, and therefore, this work concludes with some guidelines related to dataset annotation, metadata/sublabels, and specific automotive use cases.
format	Article
id	doaj-art-bbd861b1714c45b5977d7db5a19f931e
institution	Kabale University
issn	2632-6736
language	English
publishDate	2024-01-01
publisher	Cambridge University Press
record_format	Article
series	Data-Centric Engineering
spelling	doaj-art-bbd861b1714c45b5977d7db5a19f931e2024-11-25T06:27:06ZengCambridge University PressData-Centric Engineering2632-67362024-01-01510.1017/dce.2024.39The inconvenient truth of ground truth errors in automotive datasets and DNN-based detectionPak Hung Chan0https://orcid.org/0000-0003-1705-5430Boda Li1Gabriele Baris2Qasim Sadiq3Valentina Donzella4WMG, University of Warwick, Coventry, UKWMG, University of Warwick, Coventry, UKWMG, University of Warwick, Coventry, UKWMG, University of Warwick, Coventry, UKWMG, University of Warwick, Coventry, UKAssisted and automated driving functions will rely on machine learning algorithms, given their ability to cope with real-world variations, e.g. vehicles of different shapes, positions, colors, and so forth. Supervised learning needs annotated datasets, and several automotive datasets are available. However, these datasets are tremendous in volume, and labeling accuracy and quality can vary across different datasets and within dataset frames. Accurate and appropriate ground truth is especially important for automotive, as “incomplete” or “incorrect” learning can negatively impact vehicle safety when these neural networks are deployed. This work investigates the ground truth quality of widely adopted automotive datasets, including a detailed analysis of KITTI MoSeg. According to the identified and classified errors in the annotations of different automotive datasets, this article provides three different criteria collections for producing improved annotations. These criteria are enforceable and applicable to a wide variety of datasets. The three annotations sets are created to (i) remove dubious cases; (ii) annotate to the best of human visual system; and (iii) remove clear erroneous BBs. KITTI MoSeg has been reannotated three times according to the specified criteria, and three state-of-the-art deep neural network object detectors are used to evaluate them. The results clearly show that network performance is affected by ground truth variations, and removing clear errors is beneficial for predicting real-world objects only for some networks. The relabeled datasets still present some cases with “arbitrary”/“controversial” annotations, and therefore, this work concludes with some guidelines related to dataset annotation, metadata/sublabels, and specific automotive use cases.https://www.cambridge.org/core/product/identifier/S263267362400039X/type/journal_articlemachine learningautomated vehiclesautomotive datasetlabeling
spellingShingle	Pak Hung Chan Boda Li Gabriele Baris Qasim Sadiq Valentina Donzella The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection Data-Centric Engineering machine learning automated vehicles automotive dataset labeling
title	The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection
title_full	The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection
title_fullStr	The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection
title_full_unstemmed	The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection
title_short	The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection
title_sort	inconvenient truth of ground truth errors in automotive datasets and dnn based detection
topic	machine learning automated vehicles automotive dataset labeling
url	https://www.cambridge.org/core/product/identifier/S263267362400039X/type/journal_article
work_keys_str_mv	AT pakhungchan theinconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT bodali theinconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT gabrielebaris theinconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT qasimsadiq theinconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT valentinadonzella theinconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT pakhungchan inconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT bodali inconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT gabrielebaris inconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT qasimsadiq inconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection AT valentinadonzella inconvenienttruthofgroundtrutherrorsinautomotivedatasetsanddnnbaseddetection

The inconvenient truth of ground truth errors in automotive datasets and DNN-based detection

Similar Items