机构地区: 广西师范大学计算机科学与信息工程学院
出 处: 《计算机应用与软件》 2010年第4期146-148,161,共4页
摘 要: 基于本体的信息抽取技术是一种把本体和信息处理技术结合起来实现信息抽取的一种方法。提出一种基于本体的旅游领域信息抽取方法。该方法依据旅游本体的关键词定位页面信息区域,从网页中抽取正文信息,对其进行分词处理及过滤,再根据Java标注模式引擎JAPE(Java Annotation Patterns Engine)编写的规则进行本体匹配,形成结构化的内容,存入数据库。最后,通过实验证明了所提出的方法的准确性。 Ontology-based information extraction is a method of information extraction realisation which combines ontology with information processing technique together.An ontology-based tourism information extraction is proposed in this paper.The method positions the webpage information region according to the keywords in tourism ontology and extracts content information from the website,then the word segmentation and filtration are performed,follows up with the ontology matching based on the rule compiled by JAPE to form structured text to be stored in the database.At last,the accuracy of the method proposed in this paper is proved through the experiment.
领 域: [自动化与计算机技术] [自动化与计算机技术]