Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection

Recent research has employed deep learning to detect intake gestures from inertial sensor and video camera data. However, the fusion of these modalities has not been attempted. The present research explores the potential of fusing the outputs of two individual deep learning inertial and video intake...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hamid Heydarian, Marc T. P. Adam, Tracy L. Burrows, Megan E. Rollo
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Score-level fusion decision-level fusion intake gesture detection deep leaning inertial accelerometer
Online Access:	https://ieeexplore.ieee.org/document/9567689/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841563310523154432
author	Hamid Heydarian Marc T. P. Adam Tracy L. Burrows Megan E. Rollo
author_facet	Hamid Heydarian Marc T. P. Adam Tracy L. Burrows Megan E. Rollo
author_sort	Hamid Heydarian
collection	DOAJ
description	Recent research has employed deep learning to detect intake gestures from inertial sensor and video camera data. However, the fusion of these modalities has not been attempted. The present research explores the potential of fusing the outputs of two individual deep learning inertial and video intake gesture detection models (i.e., score-level and decision-level fusion) using the test sets from two publicly available multimodal datasets: (1) OREBA-DIS recorded from 100 participants while consuming food served in discrete portions and (2) OREBA-SHA recorded from 102 participants while consuming a communal dish. We first assess the potential of fusion by contrasting the performance of the individual models in intake gesture detection. The assessment shows that fusing the outputs of individual models is more promising on the OREBA-DIS dataset. Subsequently, we conduct experiments using different score-level and decision-level fusion approaches. Our results from fusion show that the score-level fusion approach of max score model performs best of all considered fusion approaches. On the OREBA-DIS dataset, the max score fusion approach (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.871$ </tex-math></inline-formula>) outperforms both individual video (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.855$ </tex-math></inline-formula>) and inertial (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.806$ </tex-math></inline-formula>) models. However, on the OREBA-SHA dataset, the max score fusion approach (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.873$ </tex-math></inline-formula>) fails to outperform the individual inertial model (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.895$ </tex-math></inline-formula>). Pairwise comparisons using bootstrapped samples confirm the statistical significance of these differences in model performance (<inline-formula> <tex-math notation="LaTeX">$p \lt $ </tex-math></inline-formula>.001).
format	Article
id	doaj-art-9ca7b659c73e4fc9894fb51d4e2f24d3
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-9ca7b659c73e4fc9894fb51d4e2f24d32025-01-03T00:01:59ZengIEEEIEEE Access2169-35362025-01-011364365510.1109/ACCESS.2021.31192539567689Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture DetectionHamid Heydarian0https://orcid.org/0000-0002-9824-5828Marc T. P. Adam1https://orcid.org/0000-0002-6036-4282Tracy L. Burrows2Megan E. Rollo3School of Information and Physical Sciences, The University of Newcastle, Callaghan, NSW, AustraliaSchool of Information and Physical Sciences, The University of Newcastle, Callaghan, NSW, AustraliaPriority Research Centre for Physical Activity and Nutrition, The University of Newcastle, Callaghan, NSW, AustraliaPriority Research Centre for Physical Activity and Nutrition, The University of Newcastle, Callaghan, NSW, AustraliaRecent research has employed deep learning to detect intake gestures from inertial sensor and video camera data. However, the fusion of these modalities has not been attempted. The present research explores the potential of fusing the outputs of two individual deep learning inertial and video intake gesture detection models (i.e., score-level and decision-level fusion) using the test sets from two publicly available multimodal datasets: (1) OREBA-DIS recorded from 100 participants while consuming food served in discrete portions and (2) OREBA-SHA recorded from 102 participants while consuming a communal dish. We first assess the potential of fusion by contrasting the performance of the individual models in intake gesture detection. The assessment shows that fusing the outputs of individual models is more promising on the OREBA-DIS dataset. Subsequently, we conduct experiments using different score-level and decision-level fusion approaches. Our results from fusion show that the score-level fusion approach of max score model performs best of all considered fusion approaches. On the OREBA-DIS dataset, the max score fusion approach (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.871$ </tex-math></inline-formula>) outperforms both individual video (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.855$ </tex-math></inline-formula>) and inertial (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.806$ </tex-math></inline-formula>) models. However, on the OREBA-SHA dataset, the max score fusion approach (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.873$ </tex-math></inline-formula>) fails to outperform the individual inertial model (<inline-formula> <tex-math notation="LaTeX">$F_{1} =0.895$ </tex-math></inline-formula>). Pairwise comparisons using bootstrapped samples confirm the statistical significance of these differences in model performance (<inline-formula> <tex-math notation="LaTeX">$p \lt $ </tex-math></inline-formula>.001).https://ieeexplore.ieee.org/document/9567689/Score-level fusiondecision-level fusionintake gesture detectiondeep leaninginertialaccelerometer
spellingShingle	Hamid Heydarian Marc T. P. Adam Tracy L. Burrows Megan E. Rollo Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection IEEE Access Score-level fusion decision-level fusion intake gesture detection deep leaning inertial accelerometer
title	Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection
title_full	Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection
title_fullStr	Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection
title_full_unstemmed	Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection
title_short	Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection
title_sort	exploring score level and decision level fusion of inertial and video data for intake gesture detection
topic	Score-level fusion decision-level fusion intake gesture detection deep leaning inertial accelerometer
url	https://ieeexplore.ieee.org/document/9567689/
work_keys_str_mv	AT hamidheydarian exploringscorelevelanddecisionlevelfusionofinertialandvideodataforintakegesturedetection AT marctpadam exploringscorelevelanddecisionlevelfusionofinertialandvideodataforintakegesturedetection AT tracylburrows exploringscorelevelanddecisionlevelfusionofinertialandvideodataforintakegesturedetection AT meganerollo exploringscorelevelanddecisionlevelfusionofinertialandvideodataforintakegesturedetection

Exploring Score-Level and Decision-Level Fusion of Inertial and Video Data for Intake Gesture Detection

Similar Items