作 者: ;
机构地区: 华南理工大学经济与贸易学院电子商务系
出 处: 《计算机应用与软件》 2009年第8期144-146,161,共4页
摘 要: 不平衡数据集是指某类样本数量明显少于其它类样本数量的数据集,传统的分类算法在处理不平衡数据分类问题时会倾向于多数类,而导致少数类的分类精度较低。针对文本数据的不平衡情况,首先采用权重润饰(Weight-retouching)的方法进行特征提取,然后采用欠取样(Under sampling)的支持向量机SVM(Support Vector Machine)方法进行文本分类。通过实验发现,使用权重润饰和欠取样的SVM方法可以提高处理不平衡数据的分类精度。 Imbalanced data set is that the number of a certain class samples is noticeably fewer than the number of other class samplesIt causes the deduction of classification precision in minority class samples,when imbalanced data set is classified by the traditional algorithm, which tends to favour the majority class samplesln this paper,we paid attention to the imbalance situation of the text data and used weight-re- touching method to make the characteristic extraction followed by using under sampling SVM method to classify the textIt is shown through the experiments that these two data processing techniques can improve the classification precision of the imbalanced data set.
领 域: [自动化与计算机技术] [自动化与计算机技术] [农业科学]