Efficient Data Placement in Deduplication Enabled ZenFS via CRC-Based Prediction

The Zoned Namespace (ZNS) interface shifts data management responsibility to upper-level applications, requiring them to reclaim space by issuing the zone-reset command to ZNS SSD devices, a process known as garbage collection (GC). Application-level GC can lead to performance degradation due to the...

Full description

Saved in:
Bibliographic Details
Main Authors: Safdar Jamil, Joseph Ro, Joo-Young Hwang, Youngjae Kim
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10807169/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Zoned Namespace (ZNS) interface shifts data management responsibility to upper-level applications, requiring them to reclaim space by issuing the zone-reset command to ZNS SSD devices, a process known as garbage collection (GC). Application-level GC can lead to performance degradation due to the high valid data copy overhead, which is further exacerbated by the larger GC units in ZNS SSDs. However, the impact of larger GC units can be mitigated if GC operations are made interruptible, allowing I/O requests to be served during zone resets or block reclamation. Moreover, the adoption of offline data deduplication as a storage optimization technique in ZNS-based file systems like ZenFS presents additional challenges. Offline deduplication must consider lifetime-based file allocation to avoid deduplicating hot data, and placing unique and duplicate data blocks together can further increase valid data copy overhead during GC. To address these issues, we propose DeZNS, an innovative data placement strategy for deduplication-enabled ZenFS. DeZNS tackles the increased valid data copy overhead during GC in offline deduplication by employing a lightweight CRC32 checksum-based method to predict potential duplicates with minimal performance impact, segregating unique and duplicate data blocks. This segregation reduces valid data migration overhead during GC, while the interruptible GC mechanism ensures that ongoing I/O requests are not delayed during zone resets, maintaining ZenFS performance. Additionally, DeZNS integrates an offline deduplication module that operates on segregated zones. Our extensive evaluation shows that DeZNS reduces valid data migration by 28% compared to baseline ZenFS and by up to <inline-formula> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> compared to naive offline deduplication in micro-benchmarks.
ISSN:2169-3536