Establishing vocabulary tests as a benchmark for evaluating large language models.
Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect th...
Saved in:
| Main Authors: | Gonzalo Martínez, Javier Conde, Elena Merino-Gómez, Beatriz Bermúdez-Margaretto, José Alberto Hernández, Pedro Reviriego, Marc Brysbaert |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2024-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0308259 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Pedro Reviriego, et al.
Published: (2024-12-01) -
Benchmarking Large Language Models for News Summarization
by: Tianyi Zhang, et al.
Published: (2024-02-01) -
Survey of Different Large Language Model Architectures: Trends, Benchmarks, and Challenges
by: Minghao Shao, et al.
Published: (2024-01-01) -
Towards a benchmark dataset for large language models in the context of process automation
by: Tejennour Tizaoui, et al.
Published: (2024-12-01) -
Large Language Model-Driven Structured Output: A Comprehensive Benchmark and Spatial Data Generation Framework
by: Diya Li, et al.
Published: (2024-11-01)