帮助 本站公告
您现在所在的位置:网站首页 > 知识中心 > 文献详情
文献详细Journal detailed

基于人工智能技术NAIVE BAYES文本自动分类系统研究
The Study of Naive Bayes Text Classification System Based on Artificial Intelligence

导  师: 孙炳达

学科专业: H1103

授予学位: 硕士

作  者: ;

机构地区: 广东工业大学

摘  要:   本文首先全面分析了当前文本自动分类领域关键技术和算法,同时阐述了典型文本自动分类系统的核心技术和系统结构,并对文本分类的应用范畴作了总结。     针对朴素贝叶斯文本分类算法上述先天性不足,本文将模糊系统和神经网络引入文本信息处理,通过摒除模糊系统和神经网络各自的缺点,结合各自的优点——模糊系统的基于知识先验规则性质(同有监督分类结合紧密)和神经网络较强的学习能力(增强分类算法的鲁棒性和泛化能力)——来修正朴素贝叶斯分类算法,实现了基于人工智能技术的贝叶斯文本分类系统,并对其修正前后的分类性能作了比较分析。实验结果表明修正算法不仅大大提高了朴素贝叶斯分类系统的分类准确度,而且还改善了训练集分类准确率分布的平滑性,从而得到了更接近人脑知识分类处理的分类结果。 Tremendous informantion appears in internet, digital library and intranet of company as text format with the coming of informantion times, especially with the influence to people's life of internet. How to obtain the needed information quickly and accurately becomes a study hotspot in the field of information processing. The technique of text classification based on artificial intelligence/(AI/) seems as one of approaches to solve such problems. This thesis aims to discuss the text classification from the point of view of classification theory, algorithms modification and realization.At first, the traditional solutions to some key technical problems in the field of text categorization are studied, also core techniques and system architecture of the typical text categorization systems are discussed, and then the applications of text categorization are summarized in this paper.From the point of view of statistics, the traditional statistical text classification methods are powerful, but they are often based on assumptions that do not hold for the real world data and the results can be hard to interpret. They come up with a high precision which may not be necessary in any case but can cost a lot. Furthermore there is need for fundamental mathematical knowledge to use these approaches. And then Naive Bayes classifier which is a simple but powerful type of classifiers based on statistics is studied profoundly. In fact, there are inevitable semantic association in the context. That is to say, the individual words in each document don't hold the condition that they are independent and identically distributed. Therefore, the strong conditional independence and distribution assumption underlying Naive Bayes classifier can sometimes not only lead to poor classification performance but do not hold for the real text feature vector.Aiming at the shortcomings underlying in the Naive Bayes algorithm, fuzzy system and neural network are introduced to text information processing to improve Naive Bayes classification performance by getting rid of its disadvantages and combining its advantages separately. That the prior knowledge/(rule-based/) can be used in fussy system which is similar with supervised text categorization and the study capability of neural network which can buildup the adaption to modified environment are studied particularly to amend Naive Bayes algorithm. And then a Naive Bayes classifier based on AI is realized. The experimental results demonstrate that the amended algorithm not only raises classification accuracy greatly, but ameliorates the smoothness of accuracy distribution for each category so that get the classification results similar with artificial methods.

关 键 词: 文本分类 朴素贝叶斯 模糊系统 神经网络 人工智能

分 类 号: [G254.11 G254.361]

领  域: [文化科学] [文化科学]


作者 贺星星
作者 奉国和
作者 李利梅
作者 周凌燕
作者 肖可


机构 华南理工大学
机构 华南理工大学工商管理学院
机构 暨南大学
机构 中山大学
机构 广东外语外贸大学


作者 庞菊香
作者 康超
作者 廖燕萍
作者 廖荆梅
作者 张丽娟