Are queries and keys always relevant? A case study on transformer wave functions

The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the...

Full description

Saved in:
Bibliographic Details
Main Authors: Riccardo Rende, Luciano Loris Viteritti
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/ada1a0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841543698435801088
author Riccardo Rende
Luciano Loris Viteritti
author_facet Riccardo Rende
Luciano Loris Viteritti
author_sort Riccardo Rende
collection DOAJ
description The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional J _1 – J _2 Heisenberg model, a common benchmark in the field of quantum many-body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems.
format Article
id doaj-art-736a7bcb659545ddbd701ede6ba12403
institution Kabale University
issn 2632-2153
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj-art-736a7bcb659545ddbd701ede6ba124032025-01-13T07:23:51ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101050110.1088/2632-2153/ada1a0Are queries and keys always relevant? A case study on transformer wave functionsRiccardo Rende0https://orcid.org/0000-0001-5656-4241Luciano Loris Viteritti1https://orcid.org/0009-0004-2332-7943International School for Advanced Studies , Trieste, ItalyUniversity of Trieste , Trieste, ItalyThe dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional J _1 – J _2 Heisenberg model, a common benchmark in the field of quantum many-body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems.https://doi.org/10.1088/2632-2153/ada1a0neural network quantum statesvariational Monte Carlovision transformer wave functionattention mechanisms
spellingShingle Riccardo Rende
Luciano Loris Viteritti
Are queries and keys always relevant? A case study on transformer wave functions
Machine Learning: Science and Technology
neural network quantum states
variational Monte Carlo
vision transformer wave function
attention mechanisms
title Are queries and keys always relevant? A case study on transformer wave functions
title_full Are queries and keys always relevant? A case study on transformer wave functions
title_fullStr Are queries and keys always relevant? A case study on transformer wave functions
title_full_unstemmed Are queries and keys always relevant? A case study on transformer wave functions
title_short Are queries and keys always relevant? A case study on transformer wave functions
title_sort are queries and keys always relevant a case study on transformer wave functions
topic neural network quantum states
variational Monte Carlo
vision transformer wave function
attention mechanisms
url https://doi.org/10.1088/2632-2153/ada1a0
work_keys_str_mv AT riccardorende arequeriesandkeysalwaysrelevantacasestudyontransformerwavefunctions
AT lucianolorisviteritti arequeriesandkeysalwaysrelevantacasestudyontransformerwavefunctions