机构地区: 华南农业大学信息学院
出 处: 《郑州大学学报(理学版)》 2009年第1期77-80,共4页
摘 要: 针对分类属性数据,基于信息熵,提出一种度量特征重要程度的定义,结合聚类分析,提出一种无指导的特征选择方法.该方法时间复杂度与数据集的大小和特征个数近似成线性关系,适合于大规模数据集中的特征选择.实验结果表明,该方法具有较好的性能,提出的特征选择方法有效实用. For categorical data,a method is put forward to measure significance of feature based on information entropy.Based on clustering,an unsupervised feature selection method is presented.The time complexity of the method is nearly linear with the size of dataset and the number of features.Besides,the method is applicable to the selection of features in large dataset.The results of the experiment on UCI datasets show that the method is effective and practicable.
领 域: [自动化与计算机技术] [自动化与计算机技术]