机构地区: 南京大学计算机科学与技术系计算机软件新技术国家重点实验室
出 处: 《计算机科学》 2003年第11期112-115,共4页
摘 要: 1.引言 当前,Internet上广泛流行的各种搜索引擎,为人们寻找资源提供了便利,而且还辅以各种用于提高精确度的技术,但普遍缺乏导引能力,即不能帮助用户确定所需信息所在的领域,使得获得的结果经常是风马牛不相及.所以,目前迫切需要的就是开发一种智能化、个性化的搜索工具,使其能够满足不同用户对不同领域的信息进行发现和积累的要求. Works on abstracting semantic information from substantive pages of Web and their usage in search engine can lead to intelligent retrieval,or other individual services. This paper mainly focuses on some research about analysis of Web page classification infor. Ontology as a base,using TFIDF word weights and Rocchio algorithm is combined with EM to improve accuracy of classifier. It's proved that this EM procedure works well on enhancing the veracity by the usage of unlabeled pages when the samples are limited.
领 域: [自动化与计算机技术] [自动化与计算机技术]