机构地区: 同济大学经济与管理学院
出 处: 《信息系统学报》 2012年第2期76-86,共11页
摘 要: 本文采用统计机器学习方法,对面向情感分类的中文网络评论特征项的选择进行研究。选取词性、词性组合、N—gram作为情感文本的潜在特征项,利用文档频率法对特征项实施降维处理,采用布尔权重法构建特征向量,并采用SVM分类器进行网络评论的情感分类。最后,以手机网络评论为对象进行实验分析,并采用卡方检验测试实验结果的差异显著性。结果表明,中文网络评论的情感分类中,将形容词作为特征项可以获得较高的分类准确率和效率;选用N—gram作为特征项时,分类准确率随着阶数的增加而下降;选取训练语料和特征项的数量对分类效果也有显著影响,但并非数量越多准确率越高。 Using statistical machine learning methods for sentiment classification of Chinese online reviews feature selection research. Select words, various combinations of words, N-gram as the potential sentimental feature. Use theDocument Frequency to reduce dimensionality,adopt Boolean Weighting method to structure vectors and SVM classifier to classify online reviews. At last,have an experimental analysis based on online reviews of mobile phone. The results showed that: sentiment classification of Chinese online reviews will obtain the highest accuracy when taking adjectives, adverbs and verbs together as the feature. When taking N-gram as the feature,the results showed that low order N grams can achieve a better performance than high order N-grams. Different training corpus size and feature size have distinct impact on classification,but not the more the better.