机构地区: 华中科技大学控制科学与工程系系统工程研究所
出 处: 《系统工程与电子技术》 2007年第2期281-284,共4页
摘 要: 针对定题Web检索技术,研究了元数据在定题Web信息采集中的重要作用。设计了基于Web元数据的主题扩展系统及定题信息采集系统,并给出了系统实现的具体步骤。同时提出了基于Web元数据的多种定题信息采集策略。实验证明经主题扩展的Web元数据可作为网页主题相关性的重要判别依据,带增益的元数据平均权值启发式采集策略算法具有较好的性能。 Topic-specific Web search engine is a new direction of information retrieval. Rather than collecting and indexing all accessible Web documents, the topic-specific Web search system restricts its crawl boundary to find links that likely to be most relevant to the given topic. Topic-specific information gathering is the sticking point in the full system. The significance of Web metadata in topic-specific information gathering is discussed. Meanwhile, based on Web metadata, a topic expansion system and a topic-specific information gathering system are designed and a new approach for guiding crawlers to gather topic relevant pages is proposed. Experimental results indicate that the proposed approach has better performance.
领 域: [自动化与计算机技术] [自动化与计算机技术]