ML-NIC: accelerating machine learning inference using smart network interface cards

Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to...

Full description

Saved in:

Bibliographic Details
Main Authors:	Raghav Kapoor, David C. Anastasiu, Sean Choi
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-01-01
Series:	Frontiers in Computer Science
Subjects:	machine learning SmartNIC Netronome data plane inference
Online Access:	https://www.frontiersin.org/articles/10.3389/fcomp.2024.1493399/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841558736391372800
author	Raghav Kapoor David C. Anastasiu Sean Choi
author_facet	Raghav Kapoor David C. Anastasiu Sean Choi
author_sort	Raghav Kapoor
collection	DOAJ
description	Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all parts of the inference tasks onto specialized hardware such as graphic processing units. More recently, offloading machine learning inference onto programmable network devices, such as programmable network interface cards or a programmable switch, is gaining interest from both industry and academia, especially due to the latency reduction and computational benefits of performing inference directly on the data plane where the network packets are processed. Yet, current approaches are relatively limited in scope, and there is a need to develop more general approaches for mapping offloading machine learning models onto programmable network devices. To fulfill such a need, this work introduces a novel framework, called ML-NIC, for deploying trained machine learning models onto programmable network devices' data planes. ML-NIC deploys models directly into the computational cores of the devices to efficiently leverage the inherent parallelism capabilities of network devices, thus providing huge latency and throughput gains. Our experiments show that ML-NIC reduced inference latency by at least 6 × on average and in the 99th percentile and increased throughput by at least 16x with little to no degradation in model effectiveness compared to the existing CPU solutions. In addition, ML-NIC can provide tighter guaranteed latency bounds in the presence of other network traffic with shorter tail latencies. Furthermore, ML-NIC reduces CPU and host server RAM utilization by 6.65% and 320.80 MB. Finally, ML-NIC can handle machine learning models that are 2.25 × larger than the current state-of-the-art network device offloading approaches.
format	Article
id	doaj-art-f1f4af7cbf404d1eaa3b5ae61253ef82
institution	Kabale University
issn	2624-9898
language	English
publishDate	2025-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Computer Science
spelling	doaj-art-f1f4af7cbf404d1eaa3b5ae61253ef822025-01-06T06:59:14ZengFrontiers Media S.A.Frontiers in Computer Science2624-98982025-01-01610.3389/fcomp.2024.14933991493399ML-NIC: accelerating machine learning inference using smart network interface cardsRaghav Kapoor0David C. Anastasiu1Sean Choi2Cloud Lab, Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, United StatesAnastasiu Lab, Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, United StatesCloud Lab, Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, United StatesLow-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all parts of the inference tasks onto specialized hardware such as graphic processing units. More recently, offloading machine learning inference onto programmable network devices, such as programmable network interface cards or a programmable switch, is gaining interest from both industry and academia, especially due to the latency reduction and computational benefits of performing inference directly on the data plane where the network packets are processed. Yet, current approaches are relatively limited in scope, and there is a need to develop more general approaches for mapping offloading machine learning models onto programmable network devices. To fulfill such a need, this work introduces a novel framework, called ML-NIC, for deploying trained machine learning models onto programmable network devices' data planes. ML-NIC deploys models directly into the computational cores of the devices to efficiently leverage the inherent parallelism capabilities of network devices, thus providing huge latency and throughput gains. Our experiments show that ML-NIC reduced inference latency by at least 6 × on average and in the 99th percentile and increased throughput by at least 16x with little to no degradation in model effectiveness compared to the existing CPU solutions. In addition, ML-NIC can provide tighter guaranteed latency bounds in the presence of other network traffic with shorter tail latencies. Furthermore, ML-NIC reduces CPU and host server RAM utilization by 6.65% and 320.80 MB. Finally, ML-NIC can handle machine learning models that are 2.25 × larger than the current state-of-the-art network device offloading approaches.https://www.frontiersin.org/articles/10.3389/fcomp.2024.1493399/fullmachine learningSmartNICNetronomedata planeinference
spellingShingle	Raghav Kapoor David C. Anastasiu Sean Choi ML-NIC: accelerating machine learning inference using smart network interface cards Frontiers in Computer Science machine learning SmartNIC Netronome data plane inference
title	ML-NIC: accelerating machine learning inference using smart network interface cards
title_full	ML-NIC: accelerating machine learning inference using smart network interface cards
title_fullStr	ML-NIC: accelerating machine learning inference using smart network interface cards
title_full_unstemmed	ML-NIC: accelerating machine learning inference using smart network interface cards
title_short	ML-NIC: accelerating machine learning inference using smart network interface cards
title_sort	ml nic accelerating machine learning inference using smart network interface cards
topic	machine learning SmartNIC Netronome data plane inference
url	https://www.frontiersin.org/articles/10.3389/fcomp.2024.1493399/full
work_keys_str_mv	AT raghavkapoor mlnicacceleratingmachinelearninginferenceusingsmartnetworkinterfacecards AT davidcanastasiu mlnicacceleratingmachinelearninginferenceusingsmartnetworkinterfacecards AT seanchoi mlnicacceleratingmachinelearninginferenceusingsmartnetworkinterfacecards

ML-NIC: accelerating machine learning inference using smart network interface cards

Similar Items