Design, Implementation, and Analysis of High-Speed Single-Stage N-Sorters and N-Filters
There is strong interest in developing high-performance hardware sorting systems which can sort a set of elements as quickly as possible. The fastest of the current FPGA systems are sorting networks, in which sets of 2-sorters operate in parallel in each series stage of a multi-stage sorting process...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9308909/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | There is strong interest in developing high-performance hardware sorting systems which can sort a set of elements as quickly as possible. The fastest of the current FPGA systems are sorting networks, in which sets of 2-sorters operate in parallel in each series stage of a multi-stage sorting process. A 2-sorter is a single-stage hardware block which sorts two values, so any list with more than 2 values must be sorted with a series network of 2-sorters. A primary contribution of this work is to provide a general methodology for the design of stable single-stage hardware sorters which sort more than 2 values simultaneously. This general methodology for <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>-sorter design, with <inline-formula> <tex-math notation="LaTeX">$N{>}2$ </tex-math></inline-formula>, is then adapted for use in modern FPGAs, where it is shown that single-stage 3-sorters up to 9-sorters have speedup ratios from 2.0 to 3.5 versus the comparable state-of-the-art 2-sorter networks. A design system modification is shown to produce even faster single-stage <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>-max and <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>-min filters. When used for max pooling 32-bit data in the fastest analyzed FPGA, a single 9-max filter will process 500 million 9-pixel groups per second (4K:3840x2160 at 500 frames/second). The single-stage 9-median filter using this design methodology, useful in image processing, is shown to have speedup ratios of 3.0 to 4.1 versus state-of-the-art FPGA network implementations, even though its resource usage is comparable to, often better than, the network implementations. Ten 8-bit 9-median filters operating in parallel in the fastest FPGA will process over 5.4 billion pixels/sec (4K at over 600 frames/second). |
---|---|
ISSN: | 2169-3536 |