机构地区: 北京语言大学信息科学学院语言信息处理研究所
出 处: 《情报学报》 2009年第3期475-480,共6页
摘 要: 藏族人名汉译名识别属于人名识别的范畴,但现有的人名识别方法并不能完全切合藏族人名命名特点:藏族人名具有浓厚的宗教文化内涵,字(串)特征和内部构成复杂 其次,藏族人名中含有大量高频单字,使得藏族人名和普通词语之间歧义冲突变得十分突出,同时也使得藏族人名和上下文之间的边界变得非常模糊。本文在大规模藏族人名实例和语料库调查基础上,统计分析了藏族人名的用字(串)特征,并构建了藏族人名属性特征库 通过藏族人名的命名规则及属性特征将藏族人名形式化表示,实现了藏族人名汉译名自动识别系统。真实语料库开放测试F值达到87.12%。 Though recognition of Tibetan names is a kind of person-name recognition, current method for recognition of person-names isn't fit to the characters of Tibetan names: Tibetan names have strong religious and cultural meaning, which results in complicated character (string) features and internal structure of Tibetan names; Secondly, Tibetan names contain a lot of frequent single-character words, which makes the ambiguous conflict more outstanding between names and common words, and blurs the border around the Tibetan names. In this paper, we analysis the attributes of Tibetan names, and make full use of these statistics attributes to build a attributes library; then we build automatic identification system for Tibetan names according to the naming hales and attributes. Test on large scale real corpus shows that the system archives 87.12% for F-measure.
领 域: [语言文字] [自动化与计算机技术] [自动化与计算机技术]