机构地区: 华南理工大学计算机科学与工程学院
出 处: 《计算机科学》 2012年第4期236-239,共4页
摘 要: 为了在只有少量已知标记的数据集中获得较好的聚类效果,提出了一种基于图收缩的半监督聚类算法。首先将整个样本空间中的数据表达为一个带权图,再根据给出的must-link约束,对图进行边收缩的修改,进而增强must-link约束。在此基础上引入图拉普拉斯算子,结合cannot-link约束将样本空间投影到一个特征子空间。最后在子空间上进行聚类分析。实验结果表明,该方法不仅提高了对复杂数据的聚类结果,而且在约束对数量较少时也能获得较好的结果。 In order to get a good clustering performance in data set with a small number of labeled samples,a semi-supervised clustering algorithm based on graph contraction was proposed in this paper.At first,the whole data in sample space was represented as an edge-weighted graph.Then the graph was modified by contraction according to must-link constraints and graph theory.On this basis,we projected sample space into a subspace by combining graph laplacian with cannot-link constraints.Data clustering was conducted over the modified graph.Experimental results show that the method indeed reaches its goal for complex datasets,and it is acceptable when there has small amount of pairwise cons-traints.
关 键 词: 半监督聚类 图拉普拉斯算子 聚类分析 样本空间 机器学习
领 域: [自动化与计算机技术] [自动化与计算机技术]