帮助 本站公告
您现在所在的位置:网站首页 > 知识中心 > 文献详情
文献详细Journal detailed

基于Boosting的代价敏感软件缺陷预测方法
Cost-sensitive Software Defect Prediction Method Based on Boosting

作  者: (杨杰); (燕雪峰); (张德平);

机构地区: 南京航空航天大学计算机科学与技术学院,南京211106

出  处: 《计算机科学》 2017年第8期176-180,206,共6页

摘  要: Boosting重抽样是常用的扩充小样本数据集的方法,首先针对抽样过程中存在的维数灾难现象,提出随机属性子集选择方法以进行降维处理;进而针对软件缺陷预测对于漏报与误报的惩罚因子不同的特点,在属性选择过程中添加代价敏感算法。以多个基本k-NN预测器为弱学习器,以代价最小为属性删除原则,得到当前抽样集的k值与属性子集的预测器集合,采用代价敏感的权重更新机制对抽样过程中的不同数据实例赋予相应权值,由所有预测器集合构成自适应的集成k-NN强学习器并建立软件缺陷预测模型。基于NASA数据集的实验结果表明,在小样本情况下,基于Boosting的代价敏感软件缺陷预测方法预测的漏报率有较大程度降低,误报率有一定程度增加,整体性能优于原来的Boosting集成预测方法。 Boosting resampling is a common method to expand data sets for small samples.Firstly,aiming at dimension disaster phenomenon during resampling process,a randomly feature selection method is used to reduce the dimensions.In addition,considering the characteristic that software defect prediction's penalties for missing of true positives and the wrongly reported of negatives are different,cost-sensitive algorithm is added in feature selection process.On the basis of multi-normal k-NN weak learning,taking minimum costs as the principle,preditor which consists of k value and attributes subset of the current sampling set is get,cost-sensitive theory is imported to update weight vector during Boosting resampling process,and different instances are given corresponding weights.An adaptive ensemble k-NN learning is constructed using all the predictors,and a software defect prediction model is established.The results using NASA's data sets show that under the condition of small samples,with this model,missing of true positive rate reduces largely and the wrongly reported of negative rate increases to some extent.On the whole,compared with the origen boostingbased learning,the method of cost-sensitive software defect prediction based on boosting greatly improves the prediction effect.

关 键 词: 软件缺陷预测 代价敏感 随机属性选择 集成

相关作者

相关机构对象

相关领域作者

作者 庞菊香
作者 康秋实
作者 康超
作者 廖伟导
作者 廖刚