DMBVA - A Compression-Based Distributed Data Warehouse Management In Parallel Environment
Main Article Content
Abstract
Parallel and distributed data warehouse architectures have been evolved to support online queries on massive data in a short time. Unfortunately, the emergence of e-application has been creating extremely high volume of data that reaches to terabyte threshold. The conventional data warehouse management system is costlier in terms of storage space and processing speed and sometimes it is unable to handle such huge amount of data. As a result, there is a crucial need for the new algorithms and techniques to store and manipulate these data. In this paper, we have presented a compression-based distributed data warehouse architecture – ‘DMBVA’ for storage of warehouse data, and support online queries efficiently. We have achieved a factor of 25-30 compression compared to SQL server data warehouse. The main computational component of data warehouse is the generation and querying on the data cube. Our algorithm – ‘PCVDC’ generates data cube directly from the compressed form of data in parallel. The reduction in the size of data cube is a factor of 30-45 compared to existing methods. The response time has also been significantly improved. These improvements are achieved by eliminating the suffix and prefix redundancy, virtual nature of the data cube, direct addressability of compressed form of data and parallel computation. Experimental evaluation shows the improved performance over the existing systems.