机构地区: 中山大学资讯管理学院
出 处: 《现代图书情报技术》 2011年第10期24-28,共5页
摘 要: 针对情感词识别及情感词库构建效率不高的问题,提出一种自动提取基准情感词集的方法,从词频、词的领域性情感倾向和词的情感强度三方面进行基准词筛选,再凭借目标词与正、负基准词集的不同语义相似度进行情感词的识别和情感倾向的判断,使机器能够自动完成大部分工作,提高效率,降低构建不同领域情感词库的成本。以京东商城71061条评论和卓越网1736条评论为数据集进行实验,获得的召回率为76.36%,准确率为76.94%,情感倾向判断的准确率为62.70%。 To improve the efficiency of extracting sentiment words and building sentiment lexicon, the authors propose a method to extract a set of basic sentiment words, and then to calculate both the PMI - IR value between candidate word and the positive basic sentiment word set and the PMI - IR value between candidate words and the negative basic sentiment word set, to judge the orientation of a candidate word. Taking account of frequency, orientation, intensity and definiteness of words, computers are able to finish most of the work. It improves the efficiency and reduces cost of building sentiment lexicon. Experiment is processed on the dataset constituted with 71 061 reviews from 360buy and 1 736 reviews from Joyo. With the dataset, the method achieves a recall rate of 76. 36% , a precision of 76.94% ,and the precision of sentiment orientation is 62.70%.
领 域: [文化科学—传播学]