机构地区: 中国科学院新疆理化技术研究所
出 处: 《计算机应用研究》 2013年第4期1112-1115,共4页
摘 要: 针对维汉统计机器翻译中未登录词较多的现象和维吾尔语语言资源匮乏这一现状,结合维吾尔语构词特征以及相应的字符串相似度算法,提出了一种基于字符串相似度的维汉机器翻译未登录词识别模型。该模型借助短语表和外部词典,与未翻译的维语词求相似度,取相似度最大短语对应的汉语翻译作为此未登录词的最终翻译。实验证明,与基于词干切分的未登录词识别方法相比,此模型较好地保留了维吾尔语词信息,提高了译文的质量。 Aimed at the phenomenon that there are so many out-of-vocabulary words in Uyghur-Chinese machine translation and the situation that the Uyghur language resources are very scarce,combined the features of Uyghur and string similarity algorithms,the paper presented an out-of-vocabulary word recognition model of Uyghur-Chinese machine translation which based on string similarity algorithms.With the help of phrase based model's phrase table,and the external dictionary,the model computed the maximum strings similarity between the out-of-vocabulary word and the Uyghur words' in phrase table and dictionary,got the translation corresponding to the Uyghur word.The experiments show that compared with the out-of-vocabulary words recognition method which based on word segmentation,this model is better retaining the words' information,and also improves the quality of the translation.
关 键 词: 维汉机器翻译 短语表 字符串相似度算法 未登录词 词切分 编辑距离
领 域: [自动化与计算机技术] [自动化与计算机技术]