机构地区: 上海理工大学光电信息与计算机工程学院,上海200093
出 处: 《软件导刊》 2017年第9期65-67,71,共4页
摘 要: 针对传统基于ε-差分隐私模型的top-k关联规则挖掘算法在大规模数据环境下挖掘效率低下的问题,提出了一种并行差分隐私关联规则挖掘算法。算法利用Hadoop框架实现并行计算,利用负载均衡策略,使每一个节点分配到的数据量相当,利用指数机制挑选出k个频繁模式,采用拉普拉斯机制对这k个频繁模式添加噪音。通过实验对算法的频繁模式挖掘结果与同类算法进行比较分析,结果表明,该算法在保证挖掘结果具有可用性的前提下,在效率上较传统算法有所提升。 In order to solve the problem of low efficiency of mining Top-k Association Rules Mining Algorithm Based on the dif- ferential privacy model in large scale data environment, a parallel algorithm for mining association rules based on differential pri- vacy is proposed. The algorithm using Hadoop framework to realize parallel computing using the load balancing strategy, the a- mount of data so that each node is assigned to a selected K, using the index mechanism of frequent pattern, using the Laplasse mechanism add noise to these K frequent pattern. The results of the algorithm are compared with other algorithms. The experi- mental results show that the proposed algorithm can improve the efficiency of the mining algorithm than the traditional algorithm.