Аннотация

Column-oriented database has gained popularity as “Data Warehousing” data and performance issues for “Analytical Queries” have increased. Each attribute of a relation is physically stored as a separate column, which will help analytical queries to work fast. The overhead is incurred in tuple reconstruction for multi attribute queries. Each tuple reconstruction is joining of two columns based on tuple IDs, making it significant cost component. For reducing cost, physical design have multiple presorted copies of each base table, such that tuples are already appropriately organized in different orders across the various columns. This paper proposes a novel design, called partitioning, that minimizes the tuple reconstruction cost. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step. In addition, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Partitioning provides the direct loading of the data in respective partitions. Partitions are created on the fly and depend on distribution of data, which will work nicely in limited storage space environments.

Линки и ресурсы

тэги