System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory

High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology....

Full description

Saved in:
Bibliographic Details
Main Authors: Leandro M. Giacomini Rocha, Mohamed Naeim, Guilherme Paim, Moritz Brunion, Priya Venugopal, Dragomir Milojevic, James Myers, Mustafa Badaroglu, Marian Verhelst, Julien Ryckaert, Dwaipayan Biswas
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10750212/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack&#x2014;workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows <inline-formula> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM <inline-formula> <tex-math notation="LaTeX">$1\times $ </tex-math></inline-formula>. Our exploration with two diverse workloads&#x2014;image resolution enhancement (FSRCNN) and eye tracking (EDSNet)&#x2014;shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a <inline-formula> <tex-math notation="LaTeX">$32\times $ </tex-math></inline-formula> memory capacity can lead to a <inline-formula> <tex-math notation="LaTeX">$7.4\times $ </tex-math></inline-formula> faster execution with <inline-formula> <tex-math notation="LaTeX">$5.7\times $ </tex-math></inline-formula> higher effective TOPS/W than the <inline-formula> <tex-math notation="LaTeX">$1\times $ </tex-math></inline-formula> memory capacity case on the same technology.
ISSN:2329-9231