机构地区: 中国科学院计算技术研究所智能信息处理重点实验室
出 处: 《计算机研究与发展》 2005年第2期217-223,共7页
摘 要: FP-growth算法是目前较高效的频繁模式挖掘算法之一 ,但将它用于最大频繁模式挖掘时却不能获得较高的效率 深入分析了造成低效的原因 ,提出了利用排序FP 树挖掘最大频繁模式的算法SFP- Max 算法的主要思想如下 :①基于排序FP 树 ;②利用最大频繁模式的性质 ,减小产生的候选最大模式的规模 ;③设置中间结果集 ,缩小检验的范围 ,从而减少检验候选最大模式的时间 实验表明 ,SFP -Max是一个高效的最大频繁模式的挖掘算法 ,对于测试的数据集 ,SFP FP-growth is a high performance algorithm for mining frequent patterns at present, but it can't acquire high efficiency when it is applied to maximal frequent patterns (MFPs) mining. The cause of low efficiency is analyzed and according to the analysis an algorithm, SFP-Max, is presented. The main idea of this algorithm is that, (1) It is a sorted FP-tree based algorithm for mining MFPs. (2) The properties of MFPs are applied to reduce the size of MFI candidates. (3) A temporary set is added to reduce the size of initial test itemsets, so that the time consuming for candidates test can be reduced. In the performance study, SFP-Max is compared with MAFIA, one of the most efficient algorithms for MFPs' mining. The empirical results show that SFP-Max is an efficient algorithm, it has comparable performance with MAFIA, and in most cases it outperforms MAFIA.
领 域: [自动化与计算机技术] [自动化与计算机技术]