导 师: 杨天奇
学科专业: 081203
授予学位: 硕士
作 者: ;
机构地区: 暨南大学
摘 要: 随着网络与通信技术的迅速发展,Web信息爆炸性的增长,已经成为一个巨大的海量信息空间,搜索引擎成为网络必不可少的工具。通常单个搜索引擎能找到的相关信息不超过所有信息的一半,用户一般要通过多个搜索引擎进行检索才能较全面的检索到所需的信息,并且现在搜索引擎的搜索结果都是用列表的形式展现给用户的,用户还要从列表中挑选自己需要的信息,这对于用户来说很不方便。 因此,本文将后缀树算法应用到元搜索引擎中,设计了一个带聚类的元搜索引擎系统。用户只需提交一次搜索请求,由元搜索引擎负责转换处理后提交给多个预先选定的独立搜索引擎,并将各独立搜索引擎返回的查询结果集中起来进行聚类处理,创建类目体系,生成类标签,最后再把搜索结果以类目的形式呈现给用户,使用户能够在更高的主题层次上来查看搜索引擎返回的结果,从而可以大大缩小用户所需浏览的结果数量,缩短用户查询所需的时间。最后本文将STC算法与其他聚类算法进行比较,用实验证明STC算法在准确度和时间效率方面都高于传统的聚类算法。 With the rapid development of Internet and the communications technology, Web information increases rapidly and has become a huge mass of information space. Search engines become an indispensable tool for the network. Usually, the relevant information that a single search engine can find is not more than half of all information. Through multiple search engines, users can retrieve more comprehensive information which they need, and search engines show the results in the form of a list to the user, who have to select the information which they need from the list, it is very inconvenient to users. Therefore, this article applies suffix tree algorithm to the meta search engines, and designs a meta search engine systems with a clustering algorithm. All the users have to do is to submit only once search request, the meta search engine will conversion the request to the pre-selected search engines, which can return the search results to the cluster processing, create category system and generate labels, finally put the search results to users in the form of categories. It is convenient for users to see a search engine's results in a higher level, which can greatly reduce the number of results and shorten the time required from user queries. Finally, we compare STC algorithm with the traditional clustering algorithm,and prove that the STC algorithm is superior than the traditional clustering algorithm in accuracy and time efficiency.
领 域: [自动化与计算机技术] [自动化与计算机技术]