Are queries and keys always relevant? A case study on transformer wave functions

The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Riccardo Rende, Luciano Loris Viteritti
Format:	Article
Language:	English
Published:	IOP Publishing 2025-01-01
Series:	Machine Learning: Science and Technology
Subjects:	neural network quantum states variational Monte Carlo vision transformer wave function attention mechanisms
Online Access:	https://doi.org/10.1088/2632-2153/ada1a0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841543698435801088
author	Riccardo Rende Luciano Loris Viteritti
author_facet	Riccardo Rende Luciano Loris Viteritti
author_sort	Riccardo Rende
collection	DOAJ
description	The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional J _1 – J _2 Heisenberg model, a common benchmark in the field of quantum many-body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems.
format	Article
id	doaj-art-736a7bcb659545ddbd701ede6ba12403
institution	Kabale University
issn	2632-2153
language	English
publishDate	2025-01-01
publisher	IOP Publishing
record_format	Article
series	Machine Learning: Science and Technology
spelling	doaj-art-736a7bcb659545ddbd701ede6ba124032025-01-13T07:23:51ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101050110.1088/2632-2153/ada1a0Are queries and keys always relevant? A case study on transformer wave functionsRiccardo Rende0https://orcid.org/0000-0001-5656-4241Luciano Loris Viteritti1https://orcid.org/0009-0004-2332-7943International School for Advanced Studies , Trieste, ItalyUniversity of Trieste , Trieste, ItalyThe dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional J _1 – J _2 Heisenberg model, a common benchmark in the field of quantum many-body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems.https://doi.org/10.1088/2632-2153/ada1a0neural network quantum statesvariational Monte Carlovision transformer wave functionattention mechanisms
spellingShingle	Riccardo Rende Luciano Loris Viteritti Are queries and keys always relevant? A case study on transformer wave functions Machine Learning: Science and Technology neural network quantum states variational Monte Carlo vision transformer wave function attention mechanisms
title	Are queries and keys always relevant? A case study on transformer wave functions
title_full	Are queries and keys always relevant? A case study on transformer wave functions
title_fullStr	Are queries and keys always relevant? A case study on transformer wave functions
title_full_unstemmed	Are queries and keys always relevant? A case study on transformer wave functions
title_short	Are queries and keys always relevant? A case study on transformer wave functions
title_sort	are queries and keys always relevant a case study on transformer wave functions
topic	neural network quantum states variational Monte Carlo vision transformer wave function attention mechanisms
url	https://doi.org/10.1088/2632-2153/ada1a0
work_keys_str_mv	AT riccardorende arequeriesandkeysalwaysrelevantacasestudyontransformerwavefunctions AT lucianolorisviteritti arequeriesandkeysalwaysrelevantacasestudyontransformerwavefunctions

Are queries and keys always relevant? A case study on transformer wave functions

Similar Items