导 师: 王珊
学科专业: H1203
授予学位: 博士
作 者: ;
机构地区: 中国科学院计算技术研究所
摘 要: 该文讨论了个性化信息分发及概念检索的技术,在现有语义资源(知网、wordnet)的基础上,提出多视图概念网络模型(concept network-views model)作为文档表达和用户兴趣模型,利用概念间的联系来提高信息检索的精度.同时综合考虑个性化中的诸多要素,对用户访问模式和反馈进行聚类分析,设计和开发了系统原型框架,从而使个性化信息分发系统更为有效.在概念检索的研究中,我们则是利用领域本体将半结构化数据和结构化数据关联起来,为用户的查询提供概念级的检索结果.文档的表达同样以多视图概念网络模型为基础,并通过对html文件格式的分析来改善网页内容表达的有效性,给出高性能的词典访问方法和倒排文件索引方法. Information retrieval is concerned with selecting documents from a collection that will be of interest to a user with a stated information need or query. How to let user gets the information he /(she/) wants is becoming increasingly important in wide-area information system. This dissertation describes a new Concept Network - Views /(CN-V/) model to represent documents and user's interests, together with user's access pattern analysis and ontology theory, which can be used to overcome several limitations both in traditional user's interests modeling and information retrieval. CN-V model is the kernel of this dissertation, which can be divided into two phase: /(1/) Transform text from word space into concept space; /(2/) Generate CN-V model from concepts. In the first phase, we combine statistics technique and rule based method to resolve these problems, and we use wordnet and how-net to disambiguate word sense. Extended phrase mining algorithm is presented to extract semantic units in documents. In the second phase, we give ConceptRank algorithm to extract the topic of documents and user's interests. At the same time, the concepts that have high ConceptRank but don't give contribution to the topic will be distinguished as hub concepts, in order to avoid them affect the efficiency. At last, we present two similarity measures based on CN-V: energy decreasing algorithm and cosine measure of concept vectors. Our research on personalized information dissemination include three parts: /(1/) We give a new user's interests modeling method based on CN-V model, analyze the main factors in personalization; /(2/) By analysis on user's access pattern, we give the potential interests mining algorithm. Firstly we collect personal data, these data will be clustered and represented by Concept Network - Views model after preprocessing, which can be used in information recommendation. /(3/) Collabratative information filtering technique can support personalized information dissemination too. We use ISODATA algor
领 域: [自动化与计算机技术] [自动化与计算机技术]