文献详情 - Gdtheory理论粤军网|广东智库信息化平台

文献详细_{Journal detailed}

基于概率数据流的有效聚类算法
Effective Clustering Algorithm for Probabilistic Data Stream

下载全文在线阅读

收藏

作　　者： ; ; ;

出　　处： 《软件学报》 2009年第5期1313-1328,共16页

摘　　要： 提出一种在概率数据流上进行聚类的有效方法P-Stream.P-Stream针对数据流上的概率元组提出强簇、过渡簇和弱簇的概念,设计一种有效的在线候选簇选择策略,为每个不断到达的数据元组合理地找到可能归属的簇,并在每个检查点存储微簇快照,以便离线进一步高层聚类和演化分析.最后设计一个"积极"的二层聚类模型来判断现有的第1层聚类模型是否还适应数据流中最近到达的概率元组.实验采用KDD-CUP’98和KDD-CUP’99真实数据集以及变换高斯分布的人工数据集构造概率数据流.实验结果表明,P-Stream具有良好的聚类质量、较快的处理速度,能够有效地适应数据演化情况. An effective clustering algorithm called ＂P-Stream＂ for probabilistic data stream is developed in this paper for the first time. For the uncertain tuples in the data stream, the concepts of strong cluster, transitional clusters and weak cluster are proposed in the P-Stream. With these concepts, an effective strategy of choosing candidate cluster is designed, which can find the sound cluster for every continuously arriving data point. Then, in order to further cluster on the high level and analyze the evolving behaviors of data streams, snapshots ot micro-clusters are stored at every checkpoint. At last, an ＂aggressive＂ two-tier clustering model is introduced to judge whether the most recently arrived data point is fitting in with the first level clustering model or not. Probabilistic data streams in the experiments include KDD-CUP＇98 and KDD-CUP＇99 real data sets and synthetic data sets with changing Gaussian distributions. Comprehensive experimental results demonstrate that P-Stream is ot high quality, fast processing rate and is efficiently fitting in with the evolving situations of data streams.

关键词： 概率数据流聚类演化分析

领　　域： [自动化与计算机技术] [自动化与计算机技术]

基于概率数据流的有效聚类算法
Effective Clustering Algorithm for Probabilistic Data Stream

参考文献更多+

二级参考文献更多+

引证文献更多+

二级引证文献更多+

同被引文献更多+

耦合作品文献更多+

相关文献更多+

相关作者

相关机构对象

相关领域作者

基于概率数据流的有效聚类算法 Effective Clustering Algorithm for Probabilistic Data Stream

参考文献 更多+

二级参考文献 更多+

引证文献 更多+

二级引证文献 更多+

同被引文献 更多+

耦合作品文献 更多+

相关文献 更多+

相关作者

相关机构对象

相关领域作者

基于概率数据流的有效聚类算法
Effective Clustering Algorithm for Probabilistic Data Stream

参考文献更多+

二级参考文献更多+

引证文献更多+

二级引证文献更多+

同被引文献更多+

耦合作品文献更多+

相关文献更多+