作 者: ;
机构地区: 佛山科学技术学院电子与信息工程学院
出 处: 《佛山科学技术学院学报(自然科学版)》 2013年第5期22-26,共5页
摘 要: 提出一种改进随机子空间与C4.5决策树算法相结合的分类算法。以C4.5算法构建决策树作为集成学习的基分类器,每次迭代初始,将SMOTE采样技术与随机子空间方法相结合,生成在特征空间和数据分布上差异明显的合成样例,为基分类器提供多样化的平衡训练数据集,采用绝大多数投票方法进行最终决策的融合输出。实验结果表明,该方法对少数类和多数类均具有较高的识别率。 In this paper, a novel hybrid method of combination improved random subspace (RSM) method and C4.5 decision tree algorithm is proposed. The proposed method constructs decision tree with G4. 5 algorithm as a basic classifier, at the beginning of each iteration, just like in RSM, some features of the training data are removed, after removing a subset of the features, SMOTE is then applied to the dataset which is subsequently used to train the base classifier. In this way, a higher degree of variance and diversity training datasets for base" classifier are constructed. The fusion of decisions and the outputs are determined by the vast majority of votes. Experimental results show that the proposed method provides better classification performance than other approaches on both minority and majority classes, and is effective and feasible to deal with the imbalanced datasets.
关 键 词: 不平衡数据分类 随机子空间方法 决策树 集成学习
领 域: [自动化与计算机技术] [自动化与计算机技术]