机构地区: 西安电子科技大学机电工程学院
出 处: 《西北大学学报(自然科学版)》 2005年第2期155-158,共4页
摘 要: 目的通过对现有聚类常用算法的研究,给出一种适用于大规模中文文本数据集聚类的算法DBTC(density basedtextclustering)。方法采用在DBSCAN算法基础上改进提出的DBTC算法,对中文文本数据集进行聚类。结果DBTC算法可以发现任意形状的簇,对中文文本聚类的准确率高达80%以上。结论经过分析和实验证明DBTC算法比基本的DBSCAN算法更适合于大规模数据集。 Aim By the study of available text clustering algorithms, an advanced text clustering algorithms -DBTC (density-based text clustering) is provided which is more effective in the fields of Chinese text clustering, especially in dealing with large-scale text clustering sets. Methods DBTC is used in the area of Chinese text clustering which is improved on the basis of the widely used algorithm-DBSCAN.Results DBTC can find any shaped clusters, and the accuracy exceeds 80% in the clustering on Chinese text sets. Conclusion It is proved that DBTC performs much better than DBSCAN does in dealing with large-scale text clustering sets by algorithm analysis and practice examine.
领 域: [自动化与计算机技术] [自动化与计算机技术]