Exploring the impact of fixed theta values in RoPE on character-level language model performance and efficiency

Rotary Positional Embedding (RoPE) is a widely used technique in Transformers, influenced by the hyperparameter theta (θ). However, the impact of varying *fixed* theta values, especially the trade-off between performance and efficiency on tasks like character-level modeling, remains under-explored....

Full description

Saved in:
Bibliographic Details
Main Authors: Zhigao Huang, Musheng Chen, Shiyan Zheng
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2025.1626899/full
Tags: Add Tag
No Tags, Be the first to tag this record!