机构地区: 北京工业大学计算机学院
出 处: 《计算机应用》 2014年第5期1345-1349,共5页
摘 要: 话题句(TC)识别中采用穷举方法生成标点句的候选话题句(CTC)影响系统的执行效率和话题句识别的准确率。提出一种新的候选话题句生成方法,利用标点句在篇章中的位置特征、话题的语法特征以及话题串与说明的邻接性特征,指导候选话题句的生成过程。实验结果表明,该方法减少了候选话题句的个数,提高了系统效率。而且,通过与基于穷举式候选话题句生成策略的话题句识别工作进行对比,该方法使单个标点句话题句识别的准确率提高了0.96个百分点,使标点句序列话题句识别的准确率提高了1.31个百分点。 When identifying the Topic Clause (TC) of Punctuation Clause ( PClause), the brute-force method to generate Candidate Topic Clause (CTC) causes high time consumption and low accuracy of the identification system. A new CTC generating method was proposed, which used specific features such as the PClause location in the text, the grammatical features of the topic and the adjacent features of topic and its comment. The experimental result shows that the improved method can not only improve the efficiency of the system by reducing the number of CTCs, but also make the accuracy of TC identification for single PClause and PClause sequence increase by 0. 96 percentage points and 1. 31 percentage points respectively over the current state.
领 域: [自动化与计算机技术] [自动化与计算机技术]