Era- and Genre-Specific Stop Word Lists for Low-Resource Computational Research: A Classical Latin 'Exemplum'

In this data paper, we argue that computational researchers—particularly those working in low-resource contexts—should consult with linguistic specialists to create targeted stop lists developed with specific eras, genres, authors, or contexts in mind. We offer an exemplum of stop lists targeted at...

Full description

Saved in:
Bibliographic Details
Main Authors: Rachel E. Dubit, Annie K. Lamar
Format: Article
Language:English
Published: Ubiquity Press 2024-11-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/246
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this data paper, we argue that computational researchers—particularly those working in low-resource contexts—should consult with linguistic specialists to create targeted stop lists developed with specific eras, genres, authors, or contexts in mind. We offer an exemplum of stop lists targeted at Augustan Latin poetry. Our open-access stop lists, available as standalone files alongside a command-line based Python script, can serve as a starting point for other eras or genres of Latin literature. More broadly, the transdisciplinary and collaborative process by which these stop lists were created is of significant benefit to low-resource computational linguistics research teams.
ISSN:2059-481X