Abstract
Query optimizers depend heavily on statistics representing
column distributions to create efficient query plans. In many
cases, though, statistics are outdated or non-existent, and the
process of refreshing statistics is very expensive, especially
for ad-hoc workloads on ever bigger data. This results in
suboptimal plans that severely hurt performance. The main
problem is that any decision, once made by the optimizer, is
fixed throughout the execution of a query. In particular, each
logical operator translates into a fixed choice of a physical
operator at run-time. In this paper, we advocate for continuous
adaptation and morphing of physical operators throughout their
lifetime, by adjusting their behavior in accordance with the
statistical properties of the data. We demonstrate the benefits
of the new paradigm by designing and implementing an adaptive
access path operator called Smooth Scan, which morphs
continuously within the space of traditional index access and
full table scan. Smooth Scan behaves similarly to an index scan
for low selectivity; if selectivity increases, however, Smooth
Scan progressively morphs its behavior toward a sequential scan.
As a result, a system with Smooth Scan requires no access path
decisions up front nor does it need accurate statistics to
provide good performance. We implement Smooth Scan in PostgreSQL
and, using both synthetic benchmarks as well as TPC-H, we show
that it achieves robust performance while at the same time being
statistics-oblivious.
Links and resources
Tags
community