机构地区: 华南理工大学计算机科学与工程学院
出 处: 《计算机工程与设计》 2010年第9期2057-2060,共4页
摘 要: 针对传统抽取-转换-装载(ETL)架构在数据质量控制方面的不足,提出一种面向数据质量管理的ETL架构。根据ETL过程的特点,设计多数据源接口模块、ETL元数据描述模块、ETL任务描述模块和数据质量控制模块等。该架构以数据质量为核心,通过建立数据分析模型,利用规则推导引擎对数据分析结果生成数据清洗方案,从而有效地对数据流进行质量评估和管理。基于该设计思想开发一个ETL工具-DQETL。DQETL采用统一建模语言进行设计,并提供友好界面对ETL过程进行集中管理。最后,结合实例阐述了在该框架下进行数据质量管理的一般步骤。 To overcome the defects of traditional extract-transform-load (ETL) architecture in data quality control, an improved ETL architecture based on data quality management is presented. According to the feature of the ETL processes, four modules are designed, which cover the aspects of interface of multi data resources, description of ETL metadata, description of ETL tasks and controlling of data quality. The new architecture regards the data quality as the centre, on this basis, a data analysis model is constructed, the model provides data analysis results which is used by the rule deduction engine to generate data cleaning scheme. In this way, effective quality evaluation and management of data stream is provided. According to the idea of design, an ETL tool named Data Quality ETL (DQETL) is developed, the system is designed by unified modeling language (UML) and provides friendly interface to manage ETL processes. Finally, an example is provided to demonstrate the general process of data quality management.
关 键 词: 数据仓库 数据质量 抽取 转换 装载 规则推导 数据清洗
领 域: [自动化与计算机技术] [自动化与计算机技术]