作 者: ;
机构地区: 温州大学物理与电子信息学院
出 处: 《计算机工程》 2010年第18期197-199,202,共4页
摘 要: 为实现局部文档集抄袭的识别,将基于回退数与前跳数的广义编辑距离的近似值定义为文档抄袭距离,分析该文档抄袭距离满足三角不等式成立和弱三角不等式成立时的充分条件,提出一种快速全文识别算法,能识别出文档集内涉嫌抄袭的所有文档有序对.实验结果表明,相比其他算法,该算法在兼顾识别召回率的同时效率提高了3倍~5倍. In order to identify plagiarisms for local document set, this paper defines the document plagiarism distance as an approximate generalized edit distance based on returning number and skipping number, then uses this distance. After analyzing the sufficient conditions of satisfying triangle inequality or weak triangle inequality for the distance, it proposes an efficient full-text identification algorithm which can find out all ordered plagiarizing document pairs faithfully. Experimental results show that the algorithm improves the identifying efficiency by 3 times to 5 times meanwhile it does not lower the recall ratio
领 域: [自动化与计算机技术] [自动化与计算机技术]