*Result*: zDUR: reference-free FASTQ compressor with high compression ratio and speed.
Numanagić I, Bonfield JK, Hach F, Voges J, Ostermann J, Alberti C, et al. Comparison of high-throughput sequencing data compression tools. Nat Methods. 2016;13:1005–8. (PMID: 10.1038/nmeth.403727776113)
Ochoa I, Asnani H, Bharadia D, Chowdhury M, Weissman T, Yona G. QualComp: a new lossy compressor for quality scores based on rate distortion theory. BMC Bioinformatics. 2013;14:187. (PMID: 10.1186/1471-2105-14-187237588283698011)
Yu YW, Yorukoglu D, Peng J, Berger B. Quality score compression improves genotyping accuracy. Nat Biotechnol. 2015;33:240–3. (PMID: 10.1038/nbt.3170257489104439189)
Chandak S, Tatwawadi K, Ochoa I, Hernaez M, Weissman T. SPRING: a next-generation compressor for FASTQ data. Bioinformatics. 2018;35:2674–6. (PMID: 10.1093/bioinformatics/bty1015)
Lan D, Tobler R, Souilmi Y, Llamas B. Genozip: a universal extensible genomic data compressor. Bioinformatics. 2021;37:2225–30. (PMID: 10.1093/bioinformatics/btab102335858978388020)
Xing Y, Li G, Wang Z, Feng B, Song Z, Wu C. GTZ: a fast compression and cloud transmission tool optimized for FASTQ files. BMC Bioinformatics. 2017;18:549. (PMID: 10.1186/s12859-017-1973-5292972965751770)
Wang J, Niu Y, Xu T, Ma M, Gao D, Shi G. AMGC: adaptive match-based genomic compression algorithm. ArXiv. 2023;2304.01031.
Bonfield JK, Mahoney MV. Compression of FASTQ and SAM format sequencing data. PLoS ONE. 2013;8:e59190. (PMID: 10.1371/journal.pone.0059190235336053606433)
Roguski Ł, Deorowicz S. DSRC 2—Industry-oriented compression of FASTQ files. Bioinformatics. 2014;30:221. (PMID: 10.1093/bioinformatics/btu208)
Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, et al. Reference-free compression of high-throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics. 2015;16:288. (PMID: 10.1186/s12859-015-0709-7263702854570262)
Roguski Ł, Ochoa I, Hernaez M, Deorowicz S. FaStore: a space-saving solution for Raw sequencing data. Bioinformatics. 2018;34:2748–56. (PMID: 10.1093/bioinformatics/bty20529617939)
Kowalski TM, Grabowski S. PgRC: pseudogenome-based read compressor. Bioinformatics. 2020;36:2082–9. (PMID: 10.1093/bioinformatics/btz91931893286)
Kowalski TM, Grabowski S. PgRC2: engineering the compression of sequencing reads. Bioinformatics. 2025;41:btaf101. (PMID: 10.1093/bioinformatics/btaf1014003780111908645)
Chen S, Chen Y, Wang Z, Qin W, Zhang J, Nand H, et al. Efficient sequencing data compression and FPGA acceleration based on a two-step framework. Front Genet. 2023;14:1260531. (PMID: 10.3389/fgene.2023.12605313781114410552150)
Ji F, Zhou Q, Ruan J, Zhu Z, Liu X. A compressive seeding algorithm in conjunction with reordering-based compression. Bioinformatics. 2024;40:btae100. (PMID: 10.1093/bioinformatics/btae1003837740410955252)
*Further Information*
*Background: High-throughput sequencing technologies generate massive amounts of FASTQ data comprising nucleotide sequences, quality scores, and read identifiers, necessitating efficient compression to alleviate storage and transmission burdens. Compared to general-purpose compressors, specialized FASTQ compressors achieve higher compression performance by exploiting the inherent redundancy in FASTQ files. However, existing FASTQ-specialized compressors often suffer from limited data applicability and tend to over-optimize either compression ratio or compression speed at the expense of the other.
Results: We present zDUR, a reference-free FASTQ compressor designed for efficient and scalable handling of next-generation sequencing data across diverse platforms and sequencing data types. Benchmarking against six reference-free compressors on 15 representative datasets spanning four sequencing data types demonstrates that zDUR achieves a favorable overall balance between compression ratio and speed, with broad applicability across data types. In particular, on single-cell RNA-seq and spatial transcriptomics datasets, zDUR achieves over a tenfold increase in runtime performance while maintaining higher compression ratios than SPRING, one of the state-of-the-art reference-free FASTQ compressors.
Conclusions: zDUR offers a scalable and efficient solution for reference-free FASTQ compression, balancing performance, speed, and usability across diverse datasets.
(© 2025. The Author(s).)*
*Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The software developed in this work has been registered for software copyright protection and is also covered by a pending patent application.*