Optimal Proxy Selection for Socioeconomic Status Inference on Twitter

Individual socioeconomic status inference from online traces is a remarkably difficult task. While current methods commonly train predictive models on incomplete data by appending socioeconomic information of residential areas or professional occupation profiles, little attention has been paid to ho...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jacob Levy Abitbol, Eric Fleury, Márton Karsai
Format:	Article
Language:	English
Published:	Wiley 2019-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2019/6059673
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849304656627367936
author	Jacob Levy Abitbol Eric Fleury Márton Karsai
author_facet	Jacob Levy Abitbol Eric Fleury Márton Karsai
author_sort	Jacob Levy Abitbol
collection	DOAJ
description	Individual socioeconomic status inference from online traces is a remarkably difficult task. While current methods commonly train predictive models on incomplete data by appending socioeconomic information of residential areas or professional occupation profiles, little attention has been paid to how well this information serves as a proxy for the individual demographic trait of interest when fed to a learning model. Here we address this question by proposing three different data collection and combination methods to first estimate and, in turn, infer the socioeconomic status of French Twitter users from their online semantics. We assess the validity of each proxy measure by analyzing the performance of our prediction pipeline when trained on these datasets. Despite having to rely on different user sets, we find that training our model on professional occupation provides better predictive performance than open census data or remote sensed expert annotation of habitual environments. Furthermore, we release the tools we developed in the hope it will provide a generalizable framework to estimate socioeconomic status of large numbers of Twitter users as well as contribute to the scientific discussion on social stratification and inequalities.
format	Article
id	doaj-art-41b0fe02eea24d7ca9f267e1e12d1199
institution	Kabale University
issn	1076-2787 1099-0526
language	English
publishDate	2019-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-41b0fe02eea24d7ca9f267e1e12d11992025-08-20T03:55:40ZengWileyComplexity1076-27871099-05262019-01-01201910.1155/2019/60596736059673Optimal Proxy Selection for Socioeconomic Status Inference on TwitterJacob Levy Abitbol0Eric Fleury1Márton Karsai2Univ Lyon, Inria, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1, LIP UMR 5668, F-69007 Lyon, FranceInria, F-75012 Paris, FranceUniv Lyon, Inria, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1, LIP UMR 5668, F-69007 Lyon, FranceIndividual socioeconomic status inference from online traces is a remarkably difficult task. While current methods commonly train predictive models on incomplete data by appending socioeconomic information of residential areas or professional occupation profiles, little attention has been paid to how well this information serves as a proxy for the individual demographic trait of interest when fed to a learning model. Here we address this question by proposing three different data collection and combination methods to first estimate and, in turn, infer the socioeconomic status of French Twitter users from their online semantics. We assess the validity of each proxy measure by analyzing the performance of our prediction pipeline when trained on these datasets. Despite having to rely on different user sets, we find that training our model on professional occupation provides better predictive performance than open census data or remote sensed expert annotation of habitual environments. Furthermore, we release the tools we developed in the hope it will provide a generalizable framework to estimate socioeconomic status of large numbers of Twitter users as well as contribute to the scientific discussion on social stratification and inequalities.http://dx.doi.org/10.1155/2019/6059673
spellingShingle	Jacob Levy Abitbol Eric Fleury Márton Karsai Optimal Proxy Selection for Socioeconomic Status Inference on Twitter Complexity
title	Optimal Proxy Selection for Socioeconomic Status Inference on Twitter
title_full	Optimal Proxy Selection for Socioeconomic Status Inference on Twitter
title_fullStr	Optimal Proxy Selection for Socioeconomic Status Inference on Twitter
title_full_unstemmed	Optimal Proxy Selection for Socioeconomic Status Inference on Twitter
title_short	Optimal Proxy Selection for Socioeconomic Status Inference on Twitter
title_sort	optimal proxy selection for socioeconomic status inference on twitter
url	http://dx.doi.org/10.1155/2019/6059673
work_keys_str_mv	AT jacoblevyabitbol optimalproxyselectionforsocioeconomicstatusinferenceontwitter AT ericfleury optimalproxyselectionforsocioeconomicstatusinferenceontwitter AT martonkarsai optimalproxyselectionforsocioeconomicstatusinferenceontwitter

Optimal Proxy Selection for Socioeconomic Status Inference on Twitter

Similar Items