机构地区: 清华大学
出 处: 《中文信息学报》 2007年第3期83-91,共9页
摘 要: 随着互联网走入社会生活,网络聊天逐渐成为一种新的沟通渠道,网络聊天语言便应运而生。这类语言的日益丰富,给语言信息处理带来了新的挑战。研究发现,困难主要来自网络聊天语言的奇异性和动态性。本文借助真实网络聊天语言文本,对网络聊天语言的奇异性和动态性进行详细分析和归纳,并设计了面向解决奇异性和动态性问题的网络聊天语言文本识别与转换方法。我们先以网络聊天语言语料库为基础建立网络聊天语言模型和语言转换模型,通过信源?信道模型实现网络聊天语言向标准语言的转换。但该方法过于依赖网络聊天语言语料库,虽然能较好解决奇异性问题,但不能处理动态性问题。因此,我们进而以标准汉语语料库为基础建立文字语音映射模型,对信源?信道模型进行改进,最终有效解决了网络聊天语言的动态性问题。 Network chat language becomes ubiquitous due largely to the rapid proliferation of Internet applications. Online chat now acts as am important role in human communication, which in turn makes Network chat language popular. Network chat language processing is important but difficult. The challenges mainly come from the anoma lous and dynamic nature of the new text genre. The two distinct features of Chinese Network chat language are investigated and analyzed in this paper. Methods seeking to address the two features in Network chat language pro cessing are also proposed. We first develop a source channel model to convert chat language to standard language. Unfortunately this method relies too heavily on chat language corpus rendering the method poor in addressing the dynamic nature. We propose to introduce phonetic mapping model constructed with standard language corpus to the source channel model. The extended method is proved effective in addressing the dynamic issue by our experiments.
关 键 词: 计算机应用 中文信息处理 网络聊天语言 奇异性 动态性 语言信息处理
领 域: [自动化与计算机技术] [自动化与计算机技术]