ML-NIC: accelerating machine learning inference using smart network interface cards

Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to...

Full description

Saved in:
Bibliographic Details
Main Authors: Raghav Kapoor, David C. Anastasiu, Sean Choi
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2024.1493399/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841558736391372800
author Raghav Kapoor
David C. Anastasiu
Sean Choi
author_facet Raghav Kapoor
David C. Anastasiu
Sean Choi
author_sort Raghav Kapoor
collection DOAJ
description Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all parts of the inference tasks onto specialized hardware such as graphic processing units. More recently, offloading machine learning inference onto programmable network devices, such as programmable network interface cards or a programmable switch, is gaining interest from both industry and academia, especially due to the latency reduction and computational benefits of performing inference directly on the data plane where the network packets are processed. Yet, current approaches are relatively limited in scope, and there is a need to develop more general approaches for mapping offloading machine learning models onto programmable network devices. To fulfill such a need, this work introduces a novel framework, called ML-NIC, for deploying trained machine learning models onto programmable network devices' data planes. ML-NIC deploys models directly into the computational cores of the devices to efficiently leverage the inherent parallelism capabilities of network devices, thus providing huge latency and throughput gains. Our experiments show that ML-NIC reduced inference latency by at least 6 × on average and in the 99th percentile and increased throughput by at least 16x with little to no degradation in model effectiveness compared to the existing CPU solutions. In addition, ML-NIC can provide tighter guaranteed latency bounds in the presence of other network traffic with shorter tail latencies. Furthermore, ML-NIC reduces CPU and host server RAM utilization by 6.65% and 320.80 MB. Finally, ML-NIC can handle machine learning models that are 2.25 × larger than the current state-of-the-art network device offloading approaches.
format Article
id doaj-art-f1f4af7cbf404d1eaa3b5ae61253ef82
institution Kabale University
issn 2624-9898
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Computer Science
spelling doaj-art-f1f4af7cbf404d1eaa3b5ae61253ef822025-01-06T06:59:14ZengFrontiers Media S.A.Frontiers in Computer Science2624-98982025-01-01610.3389/fcomp.2024.14933991493399ML-NIC: accelerating machine learning inference using smart network interface cardsRaghav Kapoor0David C. Anastasiu1Sean Choi2Cloud Lab, Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, United StatesAnastasiu Lab, Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, United StatesCloud Lab, Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, United StatesLow-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all parts of the inference tasks onto specialized hardware such as graphic processing units. More recently, offloading machine learning inference onto programmable network devices, such as programmable network interface cards or a programmable switch, is gaining interest from both industry and academia, especially due to the latency reduction and computational benefits of performing inference directly on the data plane where the network packets are processed. Yet, current approaches are relatively limited in scope, and there is a need to develop more general approaches for mapping offloading machine learning models onto programmable network devices. To fulfill such a need, this work introduces a novel framework, called ML-NIC, for deploying trained machine learning models onto programmable network devices' data planes. ML-NIC deploys models directly into the computational cores of the devices to efficiently leverage the inherent parallelism capabilities of network devices, thus providing huge latency and throughput gains. Our experiments show that ML-NIC reduced inference latency by at least 6 × on average and in the 99th percentile and increased throughput by at least 16x with little to no degradation in model effectiveness compared to the existing CPU solutions. In addition, ML-NIC can provide tighter guaranteed latency bounds in the presence of other network traffic with shorter tail latencies. Furthermore, ML-NIC reduces CPU and host server RAM utilization by 6.65% and 320.80 MB. Finally, ML-NIC can handle machine learning models that are 2.25 × larger than the current state-of-the-art network device offloading approaches.https://www.frontiersin.org/articles/10.3389/fcomp.2024.1493399/fullmachine learningSmartNICNetronomedata planeinference
spellingShingle Raghav Kapoor
David C. Anastasiu
Sean Choi
ML-NIC: accelerating machine learning inference using smart network interface cards
Frontiers in Computer Science
machine learning
SmartNIC
Netronome
data plane
inference
title ML-NIC: accelerating machine learning inference using smart network interface cards
title_full ML-NIC: accelerating machine learning inference using smart network interface cards
title_fullStr ML-NIC: accelerating machine learning inference using smart network interface cards
title_full_unstemmed ML-NIC: accelerating machine learning inference using smart network interface cards
title_short ML-NIC: accelerating machine learning inference using smart network interface cards
title_sort ml nic accelerating machine learning inference using smart network interface cards
topic machine learning
SmartNIC
Netronome
data plane
inference
url https://www.frontiersin.org/articles/10.3389/fcomp.2024.1493399/full
work_keys_str_mv AT raghavkapoor mlnicacceleratingmachinelearninginferenceusingsmartnetworkinterfacecards
AT davidcanastasiu mlnicacceleratingmachinelearninginferenceusingsmartnetworkinterfacecards
AT seanchoi mlnicacceleratingmachinelearninginferenceusingsmartnetworkinterfacecards