机构地区: 深圳大学计算机与软件学院
出 处: 《深圳大学学报(理工版)》 2013年第4期409-415,共7页
摘 要: 针对高通量DNA测序技术发展产生的DNA测序数据量猛增,数据压缩技术是解决存储和传输高通量DNA序列数据问题的重要方法之一.评述DNA测序数据传统压缩方法包括替代法和统计法,以及基于参考基因组的高通量DNA测序数据压缩方法,介绍并比较重测序数据压缩、从头测序数据压缩、质量分数压缩和压缩数据检索的代表性算法,研究高通量DNA测序数据压缩面临的挑战及对未来的展望. With the development of high-throughput DNA sequencing technology,DNA sequencing data grows rapidly.The use of compression techniques provides an important candidate solution for the storage and transmission challenges of high-throughput DNA sequencing data.In this paper,the traditional DNA sequences compression methods,including substitutionary and statistical methods,and the reference-genome-based compression method for high-throughput DNA sequencing data are surveyed.The state-of-the-art algorithms of re-sequencing data compression,de novo sequencing data compression,quality score compression,and compressed data indexing are introduced and compared.The challenges and future prospects of high-throughput DNA sequencing data compression are also discussed.