帮助 本站公告
您现在所在的位置:网站首页 > 知识中心 > 文献详情
文献详细Journal detailed

自适应亲和传播聚类算法的研究与应用

导  师: 汪西莉

学科专业: 081202

授予学位: 硕士

作  者: ;

机构地区: 陕西师范大学

摘  要: 俗话说:“人以群分,物以类聚”。聚类分析就是利用计算机来实现这一目的的一种技术。它包括两个基本内容:模式相似性的度量和聚类算法。其输入是一组未划分的数据,事先不知道如何分类,也可能不知道要分成几类,但通过统计分析数据间的关系,制定合理的聚类规则,进行合理划分,从而确定每个数据所属的类别,最后按照相似性大小,把各数据聚集成为一些簇。以保证簇内数据的相似性较大,簇间数据的相似性较小。 2007年Frey与Dueck给出了一种新的聚类分析方法,称为“亲和传播聚类”/(Affinity Propagation, AP/)。亲和传播聚类/(与K-Means相比/)不需要事先指定聚类数和初始聚类中心,并且最终的聚类中心必定是原始数据中确切存在的数据点,而不是由多个数据点求平均而得到的聚类中心/(K-Means/)。经试验证明,利用它对数据进行聚类可以得到较小的误差等优点。目前该算法已被应用于人脸图像检索、基因外显子发现、最优航线搜索等方面。 亲和传播聚类相对于其他聚类方法具有许多优势,并在实践应用中也取得了一定的效果,但是该算法还处于发展初期,仍然存在一些尚未解决的关键问题,尤其在下面几个方面:1/)亲和传播聚类在聚类前是无法预知最终的聚类数,也不能保证得到聚类结果就是最优聚类结果;2/)亲和传播聚类是一种非监督聚类方法,无法完成半监督学习,即利用少量已标记样本,指导聚类过程;3/)亲和传播聚类的时间复杂度和空间复杂度严重受制于样本个数,无法处理图像分割等大规模数据。 本文就以上这些问题一一展开论述、分析以及研究,并试图结合目前的一些其他技术/(如:半监督学习理论、自适应聚类技术等/),解决该算法存在的一些问题。本文主要做了下述工作: /(1/)对聚类分析及其分类进行系统的叙述,对国内外关于聚类分析的方法和应用作了简要介绍。 /(2/)深入的研究了亲和传播聚类算法思想、聚类过程和应用,并且阐述了亲和传播聚类算法研究的现状以及目前存在的问题和挑战。 /(3/)详细介绍了目前几种主要的聚类评价函数,包括外部评价法、内部评价法、相对评价法,并阐述各种具有代表性的评价方法的特征及其对划分优化的作用,总结了聚类评价方法的应用问题。 /(4/)针对亲和传播聚类算法难以得到最优聚类结果的问题,提出了半监督自适应亲和传播聚类算法/(SAAP/)。它可以结合少量标记样本信息,从偏向参数与聚类数目之间的关系入手研究亲和传播聚类算法,实现自适应地扫描有效聚类数空间,最后根据评价函数找到最优聚类结果,解决算法中存在的聚类精度低、运算速度慢、最终聚类数目与真实情况不相符等缺点。 /(5/)针对亲和传播聚类算法不适合大规模数据处理,特别是图像分割问题,提出了基于亲和传播算法的处理大规模彩色图像分割的方法。首先对原始图像进行颜色空间变换,再进行数据采样,对采样数据进行指定聚类数的亲和传播聚类/(APGNC/),进而将聚类结果扩展至整幅图像,最后结合形态学方法对聚类结果进行区域合并,得到修正的分割结果。解决亲和传播聚类难于处理大规模彩色图像分割和分割效果差等问题。 As the saying goes:"people in groups, feather flock together. " Cluster analysis is the technology using of computer for this purpose. It consists of two basic elements: pattern similarity measure and clustering algorithm. The input is a set of unclassified data, do not know in advance how to classify or may not know in several classes, but by statistical analysis of the relationship between data, formulate a reasonable clustering rules, a reasonable division, to determine the class of each data, and finally according to the similarity to some of the data gathered into clusters. To ensure similarity of data within the cluster greater, the similarity between clusters of data smaller. Frey and Dueck 2007, presents a new clustering method, called "affinity propagation clustering" /(Affinity Propagation, AP/). Affinity propagation clustering /(compared with the K-Means cluster/) do not need to pre-designated number of clusters and initial cluster centers, and the final cluster centers must be the exact original data exist in the data points, not by the number of data point obtained by averaging the cluster center /(K-Means/). The test proved that the use of its data clustering can be small errors and so on. The algorithm has been used in the current face image retrieval, exon discovery, the optimal route search and so on. Affinity propagation clustering method compared to other clustering has many advantages, and in practical application has made a certain effect, but the algorithm is still in its infancy, there are still some key and unresolved issues, particularly in the following areas:1/) Affinity propagation clustering can not be predicted number of clusters before the final cluster, and clustering results can not be guaranteed that the optimal clustering results; 2/) Affinity propagation clustering is an unsupervised clustering method that cannot complete the semi-supervised learning, which has been marked by a small amount of samples to guide clustering process; 3/) Time complexity and space complexity of Affinity propagation clustering severely limited by the number of sample data,and it can not handle large-scale data such as image segmentation. This article started on these problems in discussion of analysis and research, and tries to combine a number of other current technologies /(such as:semi-supervised learning theory, adaptive clustering technology, etc./) to solve the existing problems of the algorithm. In this paper, do the following work: /(1/) The cluster analysis and classification system of the narrative, at home and abroad on the cluster analysis methods and applications are briefly introduced. /(2/) Depth study of the ideological affinity propagation clustering algorithm, clustering processes and applications, and describes the pro-clustering algorithm and the dissemination of the current situation and existing problems and challenges. /(3/) Details of the current clustering evaluation functions of several major, including external evaluation, internal evaluation, the relative evaluation method, and described the evaluation of various methods of representative characteristics and optimizing the role of the division, summed up the clustering Application of evaluation methods. /(4/) Affinity propagation clustering algorithm for optimal clustering results is difficult, we propose a semi-supervised clustering algorithm for adaptive transmission affinity /(SAAP/). It can be combined with a small amount of labeled samples from the relationship between the performance parameters and the number of cluster, to achieve the effective number of clusters adaptively scanning room, and finally find the optimal clustering based on the results of the evaluation function.It can solve the existing low precision, slow operation, the final number of clusters does not match with the real situation and other shortcomings. /(5/) For affinity propagation clustering algorithm is not suitable for large-scale data processing, particularly image segmentation problem, we propose a large-scale color image segmentation method based on affinity propagation. First, the original image color space conversion, and then the data sampling, the sampling data for the given number of clusters of affinity propagation clustering /(APGNC/), then the result will be extended to clustering the entire image, and finally combined with morphology . of poly Class results were merging, the segmentation results have been fixed. It can resolve that Affinity propagation clustering is difficult to deal with large-scale image segmentation and the poor segmentation.

关 键 词: 亲和传播 自适应 半监督 图像分割 区域合并 评价函数

分 类 号: [TP311.13]

领  域: [自动化与计算机技术] [自动化与计算机技术]

相关作者

作者 汪志云
作者 颜学湘
作者 屈娟娟
作者 孙有发
作者 李雪岩

相关机构对象

机构 广东工业大学
机构 中山大学新华学院
机构 广东外语外贸大学
机构 广东工业大学管理学院
机构 广东技术师范学院

相关领域作者

作者 李文姬
作者 邵慧君
作者 杜松华
作者 周国林
作者 邢弘昊