효율적인 스카이라인과 순위화 질의 계산 방법
- 효율적인 스카이라인과 순위화 질의 계산 방법
- Date Issued
- With the exponential growth in database sizes, fighting the information flood is of critical importance in modern database systems. Recently, skyline queries and top-k queries (or ranked queries) have gained considerable attention as an effective means for narrowing down the overwhelming amount of data. This dissertation focuses on developing efficient skyline and top-k computation.
First, skyline queries identify top-1 candidates without user-specific preferences. Because of the advantage of skyline queries, the existing skyline algorithms have focused on supporting efficient skyline computation. However, they mostly deal with maximizing the dominance of points. In contrast, we observe that the incomparability of points should be treated as another key factor in optimizing skyline computation. Specifically, we identify two key modules shared by the existing non-index skyline algorithms: pivot point selection and pruning. For pruning, we develop a model to analyze the cost of points, which guides our balancing act to pick a cost-optimal pivot point. We then implement our pivot selection in two algorithms that considers both dominance and incomparability.
Second, while skyline queries are useful for identifying interesting points from an entire dataset, they are known to return too many results for high-dimensional data. To address this problem, a skycube is introduced to efficiently provide users with multiple skylines with different strengths. For efficient skycube construction, state-of-the-art algorithms amortize redundant computation among subspace skylines (or cuboids) either (1) in a bottom-up fashion with the principle of sharing result or (2) in a top-down fashion with the principle of sharing structure. However, we observe further room for optimization in both principles. We aim at designing a more efficient skycube algorithm that shares multiple cuboids using more effective structures. Specifically, we first develop each principle by leveraging multiple parents and a skytree, representing recursive point-based space partitioning. We then design an efficient skycube algorithm exploiting these principles together.
Lastly, top-k queries only return the best k points satisfying users’ fuzzy information needs. We study the problem of constructing an indexing structure that efficiently computes top-k queries for varying scoring functions and retrieval sizes. The existing top-k algorithms can be categorized into three classes: list-, layer-, and view-based approaches. Among them, we focus on the layer-based approach, pre-materializing points into consecutive multiple layers. The layer-based index enables us to return top-k answers efficiently by restricting access to points in the k layers. However, we observe that the number of points accessed in each layer can be reduced further. For this purpose, we propose a dual-resolution layer structure. Specifically, we iteratively build coarse-level layers using skylines, and divide each coarse-level layer into fine-level sublayers using convex skylines. The dual-resolution layer is able to leverage not only the dominance relationship between coarse-level layers, named 8-dominance, but also a relaxed dominance relationship between fine-level sublayers, named 9-dominance.
In this dissertation, our main contributions are summarized as: (1) a scalable skyline algorithm exploiting the well-balanced pivot point selection, (2) an efficient skycube algorithm optimizing the principles of sharing result and sharing structure, and (3) an efficient top-k indexing structure with dual-resolution layers.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.