FPGA Acceleration With Hessian-Based Comprehensive Intra-Layer Mixed-Precision Quantization for Transformer Models

Recent advancements in using FPGAs as co-processors for language model acceleration, particularly for energy efficiency and flexibility, face challenges due to limited memory capacity. This limitation hinders the deployment of transformer-based language models. To address this challenge, we propose...

Full description

Saved in:
Bibliographic Details
Main Authors: Woohong Byun, Jongseok Woo, Saibal Mukhopadhyay
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10973048/
Tags: Add Tag
No Tags, Be the first to tag this record!