帮助 本站公告
您现在所在的位置:网站首页 > 知识中心 > 文献详情
文献详细Journal detailed

WEB使用挖掘与网页个性化服务推荐研究
Web Usage Mining and the Research of Personalized Recommendation

导  师: 刘建平

学科专业: H1203

授予学位: 硕士

作  者: ;

机构地区: 浙江理工大学

摘  要: 数据挖掘是计算机科学、人工智能和数据库研究方向的一项重要课题,它是从大量的、不完全的、有噪声的、模糊的、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。Web页面包含复杂的、无结构的、动态的数据信息,如何对Web上的海量信息进行分析,针对用户的需求,提供个性化推荐服务,是当今数据挖掘技术一项重要的应用。本文在总结前人研究成果的基础上,针对Web使用挖掘进行了研究,主要内容归纳如下: /(1/)对数据挖掘的基本理论知识和分类进行了总体研究,详细分析了Web使用挖掘的数据源,数据预处理的基本流程。 /(2/)对关联规则相关理论进行了详细的介绍,分析了经典Apriori算法的性能,对其进行了改进。在自然连接产生候选集以前先进行一个修剪过程,减少参加连接的项集数量,因而减小生成的候选项集规模,减少了循环迭代次数和运行时间,同时在连接判断步骤中减少多余的判断次数。 /(3/)详细的介绍了K-means聚类算法的基本思想以及流程,分析了它的优缺点,提出了一种改进的K-means算法,即MFA算法。针对K-means算法中每次调整簇中心后确定新的簇中心需要大量的距离计算,提出一种利用簇中心的变化信息来确定新簇中心的方法,通过从动态簇中心集中选取候选集的方法减少了过滤算法的计算复杂度。 /(4/)对校园网网站的日志数据进行分析处理,利用改进的挖掘算法进行数据挖掘,发现用户的访问模式,最后利用挖掘结果,给网站添加个性化推荐功能,主动为用户推荐其可能感兴趣的信息。 Data mining technology is an important topic in computer science, artificial intelligence and database research; it is a process that extracting the potentially useful information and knowledge which people does not know in advance from a large number of, incomplete, noisy, ambiguous, the practical application of random data. Web page contains complex, unstructured, dynamic data information, how to analyze vast amounts of information on the Web and provide personalized recommendation service for the user's needs, is an important applications of data mining. Based on the previous researches, we explore the Web usage mining in this paper, the main content of which can be summarized as follows: /(1/) A general research on the basic theory and classification of data mining is done,and the data source of Web usage mining and the basic process of data preprocessing are analysed primarily. /(2/) Introduced the theory of association rules primarily, and analyzed the performance of the classic Apriori algorithm, then an improved algorithm is proposed. The new algorithm adds a pruning process before the natural connection, and reduces the number of item sets that participate in the connection,therefore, the number of frequent itemset and the size of candidate itemsets generated are reduced. At the same time, it reduces the number of loop iterations and run time, the unnecessary judgment times on the step of. connection judgment. /(3/) A detailed description of the K-means clustering algorithm basic idea and process is done, analyzed its advantages and disadvantages, proposed an improved K-means algorithm, that is, the MFA algorithm. Aiming at the problem of K-means algorithm that each of the adjustments in the cluster center to determine the new cluster center requires a lot of distance calculation, proposed a new method by means of the information of cluster displacements to determine the new center, We reduce the computational complexity of filtering algorithm according to selecting candidates from the set of active cluster centers. /(4/) Analysis and preprocess log data of our campus network, then use the improved Algorithm for data mining to find user’s access patterns, at the end, we make use of it to add personalized recommendation feature for the web site, take the initiative to recommend their potential interest information for users.

关 键 词: 数据挖掘 使用挖掘 个性化推荐 算法 算法

分 类 号: [TN]

领  域: [电子电信]

相关作者

作者 王亨杰
作者 张新辉
作者 肖卫雄
作者 刘嘉敏
作者 胡晓清

相关机构对象

机构 华南理工大学
机构 暨南大学
机构 中山大学
机构 华南师范大学
机构 暨南大学管理学院

相关领域作者

作者 黄立
作者 毕凌燕
作者 廖建华
作者 王和勇
作者 郑霞