Abstract
We introduce scalable deep kernels, which combine the structural properties
of deep learning architectures with the non-parametric flexibility of kernel
methods. Specifically, we transform the inputs of a spectral mixture base
kernel with a deep architecture, using local kernel interpolation, inducing
points, and structure exploiting (Kronecker and Toeplitz) algebra for a
scalable kernel representation. These closed-form kernels can be used as
drop-in replacements for standard kernels, with benefits in expressive power
and scalability. We jointly learn the properties of these kernels through the
marginal likelihood of a Gaussian process. Inference and learning cost $O(n)$
for $n$ training points, and predictions cost $O(1)$ per test point. On a large
and diverse collection of applications, including a dataset with 2 million
examples, we show improved performance over scalable Gaussian processes with
flexible kernel learning models, and stand-alone deep architectures.
Users
Please
log in to take part in the discussion (add own reviews or comments).