The sociolinguistic foundations of language modeling
In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling varieties of language, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Artificial Intelligence |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2024.1472411/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841543743804538880 |
---|---|
author | Jack Grieve Sara Bartl Matteo Fuoli Jason Grafmiller Weihang Huang Alejandro Jawerbaum Akira Murakami Marcus Perlman Dana Roemling Bodo Winter |
author_facet | Jack Grieve Sara Bartl Matteo Fuoli Jason Grafmiller Weihang Huang Alejandro Jawerbaum Akira Murakami Marcus Perlman Dana Roemling Bodo Winter |
author_sort | Jack Grieve |
collection | DOAJ |
description | In this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling varieties of language, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: social bias, domain adaptation, alignment, language change, and scale. We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics. |
format | Article |
id | doaj-art-2e009cf201384ca8b46d1fcc575a595f |
institution | Kabale University |
issn | 2624-8212 |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Artificial Intelligence |
spelling | doaj-art-2e009cf201384ca8b46d1fcc575a595f2025-01-13T06:11:01ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-01-01710.3389/frai.2024.14724111472411The sociolinguistic foundations of language modelingJack GrieveSara BartlMatteo FuoliJason GrafmillerWeihang HuangAlejandro JawerbaumAkira MurakamiMarcus PerlmanDana RoemlingBodo WinterIn this article, we introduce a sociolinguistic perspective on language modeling. We claim that language models in general are inherently modeling varieties of language, and we consider how this insight can inform the development and deployment of language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective could help us better understand five basic challenges in language modeling: social bias, domain adaptation, alignment, language change, and scale. We argue that to maximize the performance and societal value of language models it is important to carefully compile training corpora that accurately represent the specific varieties of language being modeled, drawing on theories, methods, and descriptions from the field of sociolinguistics.https://www.frontiersin.org/articles/10.3389/frai.2024.1472411/fullAI ethicsartificial intelligencecomputational sociolinguisticscorpus linguisticslarge language modelsnatural language processing |
spellingShingle | Jack Grieve Sara Bartl Matteo Fuoli Jason Grafmiller Weihang Huang Alejandro Jawerbaum Akira Murakami Marcus Perlman Dana Roemling Bodo Winter The sociolinguistic foundations of language modeling Frontiers in Artificial Intelligence AI ethics artificial intelligence computational sociolinguistics corpus linguistics large language models natural language processing |
title | The sociolinguistic foundations of language modeling |
title_full | The sociolinguistic foundations of language modeling |
title_fullStr | The sociolinguistic foundations of language modeling |
title_full_unstemmed | The sociolinguistic foundations of language modeling |
title_short | The sociolinguistic foundations of language modeling |
title_sort | sociolinguistic foundations of language modeling |
topic | AI ethics artificial intelligence computational sociolinguistics corpus linguistics large language models natural language processing |
url | https://www.frontiersin.org/articles/10.3389/frai.2024.1472411/full |
work_keys_str_mv | AT jackgrieve thesociolinguisticfoundationsoflanguagemodeling AT sarabartl thesociolinguisticfoundationsoflanguagemodeling AT matteofuoli thesociolinguisticfoundationsoflanguagemodeling AT jasongrafmiller thesociolinguisticfoundationsoflanguagemodeling AT weihanghuang thesociolinguisticfoundationsoflanguagemodeling AT alejandrojawerbaum thesociolinguisticfoundationsoflanguagemodeling AT akiramurakami thesociolinguisticfoundationsoflanguagemodeling AT marcusperlman thesociolinguisticfoundationsoflanguagemodeling AT danaroemling thesociolinguisticfoundationsoflanguagemodeling AT bodowinter thesociolinguisticfoundationsoflanguagemodeling AT jackgrieve sociolinguisticfoundationsoflanguagemodeling AT sarabartl sociolinguisticfoundationsoflanguagemodeling AT matteofuoli sociolinguisticfoundationsoflanguagemodeling AT jasongrafmiller sociolinguisticfoundationsoflanguagemodeling AT weihanghuang sociolinguisticfoundationsoflanguagemodeling AT alejandrojawerbaum sociolinguisticfoundationsoflanguagemodeling AT akiramurakami sociolinguisticfoundationsoflanguagemodeling AT marcusperlman sociolinguisticfoundationsoflanguagemodeling AT danaroemling sociolinguisticfoundationsoflanguagemodeling AT bodowinter sociolinguisticfoundationsoflanguagemodeling |