PLORC: A Pipelined Lossless Reference-Free Compression Architecture for FASTQ Files
The rapid growth of genomic sequence datasets in a FASTQ format calls for efficient storage and transmission solutions. Compression–decompression algorithms for streaming applications offer a promising potential to address these challenges. In this paper, we present a novel Pipelined Lossless Refere...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5582 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The rapid growth of genomic sequence datasets in a FASTQ format calls for efficient storage and transmission solutions. Compression–decompression algorithms for streaming applications offer a promising potential to address these challenges. In this paper, we present a novel Pipelined Lossless Reference-free Compression (PLORC) architecture designed specifically for streaming genomic data in a FASTQ format. The proposed PLORC architecture consists of several submodules optimized for the structure of FASTQ files, maintaining the balance between the compression ratio (CR) and throughput rate (TPR). To verify the PLORC architecture in hardware, we implemented the PLORC compressor and decompressor in FPGA (field-programmable gate array). The experimental results across various open-source genomic datasets reveal that our PLORC compressor achieved about a 440 MB/s throughput rate, which was higher than the tested Gzip, LZ4, and Zstd compressors. In addition, the PLORC decompressor achieved a throughput rate matching that of the compressor. Additionally, the PLORC achieved competitive compression ratios with some well-known non-streaming compression algorithms. |
|---|---|
| ISSN: | 2076-3417 |