机构地区: 深圳腾讯计算机科技有限公司深圳518057
出 处: 《图书情报知识》 2011年第6期50-54,共5页
摘 要: 本研究针对舆情信息源特征就舆情信息增量采集、提取和存储、文本信息预处理提出了基于Web-Harvest的定点信息采集以及基于输入法平台的新词收集策略,构建了一个互联网用语扩展词库,实现了信息预处理关键模块。 According to the characteristics of the online public opinion information resources, in order to realize the information incremental acquisition, information extraction and storage, and text preprocessing, a strategy of information collection basing on the Web-Harvest, and a strategy of new word collection basing on the input platform were proposed in this study. A expansion thesaurus of internet terms was build, and the information pre-processing module was achieved.
关 键 词: 网络舆情 信息提取 文本预处理 中文分词 句法分析
领 域: [文化科学]