A framework for evaluating cultural bias and historical misconceptions in LLMs outputs
Large Language Models (LLMs), while powerful, often perpetuate cultural biases and historical inaccuracies from their training data, marginalizing underrepresented perspectives. To address these issues, we introduce a structured framework to systematically evaluate and quantify these deficiencies. O...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co. Ltd.
2025-09-01
|
| Series: | BenchCouncil Transactions on Benchmarks, Standards and Evaluations |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772485925000481 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849228333405962240 |
|---|---|
| author | Moon-Kuen Mak Tiejian Luo |
| author_facet | Moon-Kuen Mak Tiejian Luo |
| author_sort | Moon-Kuen Mak |
| collection | DOAJ |
| description | Large Language Models (LLMs), while powerful, often perpetuate cultural biases and historical inaccuracies from their training data, marginalizing underrepresented perspectives. To address these issues, we introduce a structured framework to systematically evaluate and quantify these deficiencies. Our methodology combines culturally sensitive prompting with two novel metrics: the Cultural Bias Score (CBS) and the Historical Misconception Score (HMS). Our analysis reveals varying cultural biases across LLMs, with certain Western-centric models, such as Gemini, exhibiting higher bias. In contrast, other models, including ChatGPT and Poe, demonstrate more balanced cultural narratives. We also find that historical misconceptions are most prevalent for less-documented events, underscoring the critical need for training data diversification. Our framework suggests the potential effectiveness of bias-mitigation techniques, including dataset augmentation and human-in-the-loop (HITL) verification. Empirical validation of these strategies remains an important direction for future work. This work provides a replicable and scalable methodology for developers and researchers to help ensure the responsible and equitable deployment of LLMs in critical domains such as education and content moderation. |
| format | Article |
| id | doaj-art-e53e65f60b0b4969a9898f8e6e77533f |
| institution | Kabale University |
| issn | 2772-4859 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | KeAi Communications Co. Ltd. |
| record_format | Article |
| series | BenchCouncil Transactions on Benchmarks, Standards and Evaluations |
| spelling | doaj-art-e53e65f60b0b4969a9898f8e6e77533f2025-08-23T04:49:53ZengKeAi Communications Co. Ltd.BenchCouncil Transactions on Benchmarks, Standards and Evaluations2772-48592025-09-015310023510.1016/j.tbench.2025.100235A framework for evaluating cultural bias and historical misconceptions in LLMs outputsMoon-Kuen Mak0Tiejian Luo1Institute for the History of Natural Sciences, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, ChinaUniversity of Chinese Academy of Sciences, Beijing, China; Corresponding author.Large Language Models (LLMs), while powerful, often perpetuate cultural biases and historical inaccuracies from their training data, marginalizing underrepresented perspectives. To address these issues, we introduce a structured framework to systematically evaluate and quantify these deficiencies. Our methodology combines culturally sensitive prompting with two novel metrics: the Cultural Bias Score (CBS) and the Historical Misconception Score (HMS). Our analysis reveals varying cultural biases across LLMs, with certain Western-centric models, such as Gemini, exhibiting higher bias. In contrast, other models, including ChatGPT and Poe, demonstrate more balanced cultural narratives. We also find that historical misconceptions are most prevalent for less-documented events, underscoring the critical need for training data diversification. Our framework suggests the potential effectiveness of bias-mitigation techniques, including dataset augmentation and human-in-the-loop (HITL) verification. Empirical validation of these strategies remains an important direction for future work. This work provides a replicable and scalable methodology for developers and researchers to help ensure the responsible and equitable deployment of LLMs in critical domains such as education and content moderation.http://www.sciencedirect.com/science/article/pii/S2772485925000481Large language modelArtificial intelligenceCultural biasHistorical misconceptionhuman-in-the-loop |
| spellingShingle | Moon-Kuen Mak Tiejian Luo A framework for evaluating cultural bias and historical misconceptions in LLMs outputs BenchCouncil Transactions on Benchmarks, Standards and Evaluations Large language model Artificial intelligence Cultural bias Historical misconception human-in-the-loop |
| title | A framework for evaluating cultural bias and historical misconceptions in LLMs outputs |
| title_full | A framework for evaluating cultural bias and historical misconceptions in LLMs outputs |
| title_fullStr | A framework for evaluating cultural bias and historical misconceptions in LLMs outputs |
| title_full_unstemmed | A framework for evaluating cultural bias and historical misconceptions in LLMs outputs |
| title_short | A framework for evaluating cultural bias and historical misconceptions in LLMs outputs |
| title_sort | framework for evaluating cultural bias and historical misconceptions in llms outputs |
| topic | Large language model Artificial intelligence Cultural bias Historical misconception human-in-the-loop |
| url | http://www.sciencedirect.com/science/article/pii/S2772485925000481 |
| work_keys_str_mv | AT moonkuenmak aframeworkforevaluatingculturalbiasandhistoricalmisconceptionsinllmsoutputs AT tiejianluo aframeworkforevaluatingculturalbiasandhistoricalmisconceptionsinllmsoutputs AT moonkuenmak frameworkforevaluatingculturalbiasandhistoricalmisconceptionsinllmsoutputs AT tiejianluo frameworkforevaluatingculturalbiasandhistoricalmisconceptionsinllmsoutputs |