Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels

Citizen science has become a valuable and reliable method for interpreting and processing big datasets, and is vital in the era of ever-growing data volumes. However, there are inherent difficulties in the generating labels from citizen scientists, due to the inherent variability between the members...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ramanakumar Sankar, Kameswara Mantha, Cooper Nesmith, Lucy Fortson, Shawn Brueshaber, Candice Hansen-Koharcheck, Glenn Orton
Format:	Article
Language:	English
Published:	Ubiquity Press 2024-12-01
Series:	Citizen Science: Theory and Practice
Subjects:	citizen science crowdsourcing auto-encoder semi-supervised network machine attention planetary atmospheres
Online Access:	https://account.theoryandpractice.citizenscienceassociation.org/index.php/up-j-cstp/article/view/731
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841554925594607616
author	Ramanakumar Sankar Kameswara Mantha Cooper Nesmith Lucy Fortson Shawn Brueshaber Candice Hansen-Koharcheck Glenn Orton
author_facet	Ramanakumar Sankar Kameswara Mantha Cooper Nesmith Lucy Fortson Shawn Brueshaber Candice Hansen-Koharcheck Glenn Orton
author_sort	Ramanakumar Sankar
collection	DOAJ
description	Citizen science has become a valuable and reliable method for interpreting and processing big datasets, and is vital in the era of ever-growing data volumes. However, there are inherent difficulties in the generating labels from citizen scientists, due to the inherent variability between the members of the crowd, leading to variability in the results. Sometimes, this is useful — such as with serendipitous discoveries, which corresponds to rare/unknown classes in the data — but it might also be due to ambiguity between classes. The primary issue is then to distinguish between the intrinsic variability in the dataset and the uncertainty in the citizen scientists’ responses, and leveraging that to extract scientifically useful relationships. In this paper, we explore using a neural network to interpret volunteer confusion across the dataset, to increase the purity of the downstream analysis. We focus on the use of learned features from the network to disentangle feature similarity across the classes, and the ability of the machines’ “attention” in identifying features that lead to confusion. We use data from Jovian Vortex Hunter, a citizen science project to study vortices in Jupiter’s atmosphere, and find that the latent space from the model helps effectively identify different sources of image-level features that lead to low volunteer consensus. Furthermore, the machine’s attention highlights features corresponding to specific classes. This provides meaningful image-level feature-class relationships, which is useful in our analysis for identifying vortex-specific features to better understand vortex evolution mechanisms. Finally, we discuss the applicability of this method to other citizen science projects.
format	Article
id	doaj-art-2fac5c5f839745b8949aa502c8bb1a9f
institution	Kabale University
issn	2057-4991
language	English
publishDate	2024-12-01
publisher	Ubiquity Press
record_format	Article
series	Citizen Science: Theory and Practice
spelling	doaj-art-2fac5c5f839745b8949aa502c8bb1a9f2025-01-08T07:54:40ZengUbiquity PressCitizen Science: Theory and Practice2057-49912024-12-0191414110.5334/cstp.731713Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer LabelsRamanakumar Sankar0https://orcid.org/0000-0002-6794-7587Kameswara Mantha1https://orcid.org/0000-0002-6016-300XCooper Nesmith2Lucy Fortson3https://orcid.org/0000-0002-1067-8558Shawn Brueshaber4https://orcid.org/0000-0002-3669-0539Candice Hansen-Koharcheck5Glenn Orton6https://orcid.org/0000-0001-7871-2823University of California, BerkeleyUniversity of Minnesota, Twin CitiesUniversity of Minnesota, Twin CitiesUniversity of Minnesota, Twin CitiesMichigan Technological UniversityPlanetary Science InstituteJet Propulsion Laboratory/California Institute of TechnologyCitizen science has become a valuable and reliable method for interpreting and processing big datasets, and is vital in the era of ever-growing data volumes. However, there are inherent difficulties in the generating labels from citizen scientists, due to the inherent variability between the members of the crowd, leading to variability in the results. Sometimes, this is useful — such as with serendipitous discoveries, which corresponds to rare/unknown classes in the data — but it might also be due to ambiguity between classes. The primary issue is then to distinguish between the intrinsic variability in the dataset and the uncertainty in the citizen scientists’ responses, and leveraging that to extract scientifically useful relationships. In this paper, we explore using a neural network to interpret volunteer confusion across the dataset, to increase the purity of the downstream analysis. We focus on the use of learned features from the network to disentangle feature similarity across the classes, and the ability of the machines’ “attention” in identifying features that lead to confusion. We use data from Jovian Vortex Hunter, a citizen science project to study vortices in Jupiter’s atmosphere, and find that the latent space from the model helps effectively identify different sources of image-level features that lead to low volunteer consensus. Furthermore, the machine’s attention highlights features corresponding to specific classes. This provides meaningful image-level feature-class relationships, which is useful in our analysis for identifying vortex-specific features to better understand vortex evolution mechanisms. Finally, we discuss the applicability of this method to other citizen science projects.https://account.theoryandpractice.citizenscienceassociation.org/index.php/up-j-cstp/article/view/731citizen sciencecrowdsourcingauto-encodersemi-supervised networkmachine attentionplanetary atmospheres
spellingShingle	Ramanakumar Sankar Kameswara Mantha Cooper Nesmith Lucy Fortson Shawn Brueshaber Candice Hansen-Koharcheck Glenn Orton Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels Citizen Science: Theory and Practice citizen science crowdsourcing auto-encoder semi-supervised network machine attention planetary atmospheres
title	Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels
title_full	Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels
title_fullStr	Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels
title_full_unstemmed	Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels
title_short	Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels
title_sort	understanding confusion a case study of training a machine model to predict and interpret consensus from volunteer labels
topic	citizen science crowdsourcing auto-encoder semi-supervised network machine attention planetary atmospheres
url	https://account.theoryandpractice.citizenscienceassociation.org/index.php/up-j-cstp/article/view/731
work_keys_str_mv	AT ramanakumarsankar understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels AT kameswaramantha understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels AT coopernesmith understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels AT lucyfortson understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels AT shawnbrueshaber understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels AT candicehansenkoharcheck understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels AT glennorton understandingconfusionacasestudyoftrainingamachinemodeltopredictandinterpretconsensusfromvolunteerlabels

Understanding Confusion: A Case Study of Training a Machine Model to Predict and Interpret Consensus From Volunteer Labels

Similar Items