Issue Number: Vol. 1, No. 2
Year of Publication: 2011
Page Numbers: 536-544
Authors: Louardi BRADJI , Mahmoud BOUFAIDA
Journal Name: International Journal of Digital Information and Wireless Communications (IJDIWC)
- Hong Kong


High quality of data warehouse is a key to make smart strategic decisions. The data cleaning is program that performs to deal with the quality problems of data extracted from operational sources before their loading into data warehouse. As the data cleaning can introduce errors and some data require manually clean, there is a need for an open user involvement in data cleaning for data warehouse quality. This is essential to validate the cleaned data by users and to replace the dirty data in their original sources, and also to correct the poor data that can’t be cleaned automatically. In this paper, we extend the data cleaning and extract-transform-load (ETL) processes to better support the user involvement in data quality management. We proposed that the ETL processes include two phases: the transformation to clean data at the operational data sources and the propagation of data cleaned towards their original sources. The major benefits of our proposal are twofold. First, it is the validation of cleaned data by users. Second, it allows the operational data sources quality improvement. Consequently the user involvement based data cleaning leads to a total data quality management and avoids redoing the same clean for future warehousing.