作 者: ;
机构地区: 中山大学
出 处: 《中国图书馆学报》 2017年第4期74-92,共19页
摘 要: 由于相关信息片段分散分布在海量且复杂多样的网络信息资源中,用户往往需要花费大量时间浏览、查询和收集所需信息。面向聚合搜索的细粒度聚合单元元数据可以深入揭示信息特征及其关联关系,促进知识发现并提升知识服务效率。因此,有必要构建细粒度聚合单元的元数据描述框架。本文以图书情报领域开放获取期刊论文、在线百科、博客等网络信息资源为数据源,采用逻辑结构分析和形式结构分析方法建立聚合单元划分框架,包括篇章层级的标题、著者等外部特征,以及节段、句群、图表单元中的话语意图和语义功能等特征;通过分析聚合单元的属性特征及复用DC、LOM元数据元素,构建描述聚合单元访问信息、物理信息和语义信息的元数据框架;设计检索数据库并采用实验法对聚合单元元数据框架进行验证。实验表明,该元数据框架可支持多类型网络信息资源、各层级细粒度聚合单元的检索,可为细粒度信息聚合与搜索提供理论基础与实践指导。 In the big data era, the Internet is increasingly indispensable for people to access academic or work related information. However, facing with decentralized distribution of Internet resources and lacking of in-depth description and correlation of their contents and relationships, people have to spend massive time to look through the whole search results returned and assemble the relevant information from different sources. Therefore, this paper aims to develop a meta-data schema for fine-grained aggregate units of Internet resources to reveal deeply and correlate the scattered and various kinds of information snippets, so as to meet the complex information needs of users, improve the effectiveness of retrieval and support better knowledge services. First and foremost, this paper firstly extracted three types of free Internet resources in the field of Library and Information Science, including OA papers, online encyclopedia, and blogs. Then, a general framework to split these resources was developed from the perspectives of logical structure and formal structure of text manually. In the aspect of logical structure analysis, it was divided into four levels: chapter level which is a whole document, section level based on the chapter title given by authors, sentence group level including macro analysis and micro analysis and chart level. The components of the whole document were fragmented by macro analysis based on the genre theory. And the information snippets revealing rhetorical intentions and semantic functions were identified using micro analysis further. The relationships between aggregate units of different levels were analyzed. Moreover, characteristics and attributes of aggregate units were depicted and classified, including 14 elements of access attributes, 3 elements of physical attributes and 2 elements of semantic attributes. Corresponding to the categories, a metadata schema was developed. Lastly, to examine the effectiveness of metadata schema, Access 2013 was used to design and develop a database
关 键 词: 网络信息资源 信息聚合 细粒度 聚合单元 体裁分析 元数据
领 域: []