导 师: 袁满
学科专业: 081203
授予学位: 硕士
作 者: ;
机构地区: 东北石油大学
摘 要: 数据是企业的资产,而数据的质量高低决定着这些资产的价值。由于数据质量问题是普遍存在的,同时也是不可避免问题。因此,如何为企业的决策提供高质量的数据已经成为制约企业信息化发展的关键问题。 针对油田数据数量多、种类庞杂、横跨专业多等特点而引起的数据质量不高的现状,本文以数据中心数据为研究对象,通过分析产生数据质量问题的原因,以元数据、数据质量、数据建模等理论为基础,并对数据流动过程中的质量检查、质量控制和质量评估等进行深入的研究,提出了基于元数据的数据数据质量控制与评估体系模型。该模型体现了全程数据质量控制的思想,即将影响数据质量的因素分散到几个重要的阶段,包括数据字典元数据的质量、模式层、实例层中约束规则数据库的质量以及数据质量评估标准的质量。如果能够对上述几个阶段进行质量控制,那么企业数据的质量将会得到强有力的保证。 本文针对数据中心的应用数据构建了数据字典,从数据建模的源头就开始对元数据进行控制;并根据现有数据的质量状况,从数据的完整性、一致性、准确性及时效性四个方面构建了数据质量定义模型;同时,基于数据质量定义模型在数据的模式层与实例层定义了相应的约束规则元数据模型库;然后,基于元数据模型库对数据的质量进行了控制与评估。针对数据质量可能出现问题的每一个环节,采用不同功能的元数据对数据质量全程控制,从而全面提高了数据质量。 最后,本文给出了基于元数据的数据质量控制与评估系统的实现,并将其初步应用到数据中心项目中,验证结果表明:基于元数据的数据质量控制系统具有良好的运行效率和运行效果。 The data, as the enterprise's asset, whose quality determines the level of the value of those assets. Data quality is a common problem, but also the inevitable problem. So, how to provide high-quality data for enterprise decision-making has been the key obstacle to enterprise development. For the poor data situation caused by large quantities, wide range of type and professional. In this thesis, a model as metadata-based data quality control and assessment is proposed , which focuses on the data of data center and based on theory of metadata, data quality and data model, study carefully the cause of data quality problems arising, quality inspection, quality control and quality assessment in the data flow. This model reflects the idea of all data quality management that distributed the factors of affect the quality of the data to the several stages: the quality of data dictionary, constraint rule in schema, constraint rule in instance and the data quality assessment criteria. If we can control the quality of those stages mentioned above,then the data quality of enterprise will be obtain a powerful guarantee . In this thesis, we construct the data dictionary for business data of data center to control the metadata from the beginning of modeling, and construct the data quality definition model from integrity, consistency, accuracy and timeliness of data based on the quality status of existing data. At the same time,we define the constraint rules in schema and instance to control and assess the quality of the data. It achieves all control for data by different metadata for each link that data quality problems may occur to improve the quality of the data. Finally, this thesis, we implement software, which can control and assessment data quality by meta-data, and applied preliminarily to the project of data center, the results show that good efficiency and operating results.
关 键 词: 数据质量 元数据 约束规则 质量评估 数据中心
分 类 号: [TP274]
领 域: [自动化与计算机技术] [自动化与计算机技术]