机构地区: 华南理工大学理学院
出 处: 《科学技术与工程》 2007年第21期5563-5566,共4页
摘 要: 大样本的学习是支持向量机领域中的一个重要课题。基于数据分割和邻近对策略,提出了一种新的支持向量机分类算法。在新的算法中,首先利用c均-值聚类分别对数据集中的正负类进行聚类,把大数据集分割成互不相交的子集合;然后来自正负类的子集合两两组合形成多个二分类问题,并用SMO算法求解;最后用邻近对策略对未知数据进行识别。为了验证新算法的有效性,把它应用于5个UCI数据集,并和SMO算法做了比较。结果表明:新算法不仅大大地减少了大样本学习的训练时间,而且相应的测试精度几乎没有降低。 It is an important issue how to train the large scale classification problems in the field of support vector machine.A fast support vector machine classification algorithm is presented to deal with this problem based on data partition and neighborhood pair strategy.In the proposed algorithm,c-means clustering is firstly adopted to cluster each of two classes from the training set respectively;Then m×n binary classification problems are formed based on the clustering results.Finally,based on the neighborhood pair strategy,for each sample a binary classifier which constructed by two nearest subsets from two classes is chosen to identify it.The experiments are conducted on five benchmarking UCI datasets for testing the generalization performance of the proposed algorithm.The experimental results show that the training time of the proposed algorithm is largely reduced without decreasing the predicting accuracy.
领 域: [自动化与计算机技术] [自动化与计算机技术]