文献详情 - Gdtheory理论粤军网|广东智库信息化平台

文献详细_{Journal detailed}

一种优化多重过滤的序列查询算法
An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

下载全文在线阅读

收藏

作　　者： ; ; ; ; ;

出　　处： 《计算机研究与发展》 2010年第10期1785-1796,共12页

摘　　要： 序列数据一类重要的数据类型,在文本、Web访问日志文件、生物数据库等应用中普遍存在,对其进行相似性查询是一种获取有用信息的重要手段.在大型序列数据库中进行高效相似性查询的关键因素之一就是查询算法的过滤能力,即设计能快速过滤与查询序列不相关序列集的过滤器十分重要.提出了结合序列距离的度量性质和序列自身特征的多重过滤算法SSQ_MF,SSQ_MF使用了长度过滤器、前缀过滤器和基于参考集的过滤器,使得算法过滤能力较基于单一过滤器算法进一步增强.此外,设计了有关数据结构对查询数据库的一些统计信息进行了预计算和保存,有效估计了各过滤器的过滤集大小,并构建了一个由过滤集大小确定的最优过滤顺序模型,使得算法的过滤代价最低.实验结果表明,算法SSQ_MF的查询性能优于单一过滤器算法和随机过滤顺序的多过滤器算法. Sequence data is an important data type,ubiquitous in many domains such as text,Web access log and biological database.Similarity query in this kind of data is a very important means for extracting useful information.One key factor for high performance of similarity query in huge sequence database is the filtering level of query algorithm,namely,designing those filters that can quickly filter out the unpromising strings for query string is very important.Combining metric property of sequences＇ distance with sequences＇ characteristics,an algorithm called SSQ_MF based on multiple filters is proposed,whose filtering level is further improved compared with those query algorithms with a single filter.The filters used in SSQ_MF are length filter,prefix filter,and reference-based filter.Then,the statistical information about the string database is pre-computed and some data structures are used to store those information.Furthermore,every filter＇s filtering size is effectively estimated and a model in which optimal filtering order is determined by every filter＇s filtering size is built.Comprehensive experimental results demonstrate that in terms of query performance,SSQ_MF is better than algorithms with a single filter and algorithms with multiple filters but executing in a random order.

关键词： 序列数据相似性查询过滤器过滤顺序度量空间

领　　域： [自动化与计算机技术] [自动化与计算机技术]

一种优化多重过滤的序列查询算法
An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

参考文献更多+

二级参考文献更多+

引证文献更多+

二级引证文献更多+

同被引文献更多+

耦合作品文献更多+

相关文献更多+

相关作者

相关机构对象

相关领域作者

一种优化多重过滤的序列查询算法 An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

参考文献 更多+

二级参考文献 更多+

引证文献 更多+

二级引证文献 更多+

同被引文献 更多+

耦合作品文献 更多+

相关文献 更多+

相关作者

相关机构对象

相关领域作者

一种优化多重过滤的序列查询算法
An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

参考文献更多+

二级参考文献更多+

引证文献更多+

二级引证文献更多+

同被引文献更多+

耦合作品文献更多+

相关文献更多+