Abstract
We propose pixelNeRF, a learning framework that predicts a continuous neural
scene representation conditioned on one or few input images. The existing
approach for constructing neural radiance fields involves optimizing the
representation to every scene independently, requiring many calibrated views
and significant compute time. We take a step towards resolving these
shortcomings by introducing an architecture that conditions a NeRF on image
inputs in a fully convolutional manner. This allows the network to be trained
across multiple scenes to learn a scene prior, enabling it to perform novel
view synthesis in a feed-forward manner from a sparse set of views (as few as
one). Leveraging the volume rendering approach of NeRF, our model can be
trained directly from images with no explicit 3D supervision. We conduct
extensive experiments on ShapeNet benchmarks for single image novel view
synthesis tasks with held-out objects as well as entire unseen categories. We
further demonstrate the flexibility of pixelNeRF by demonstrating it on
multi-object ShapeNet scenes and real scenes from the DTU dataset. In all
cases, pixelNeRF outperforms current state-of-the-art baselines for novel view
synthesis and single image 3D reconstruction. For the video and code, please
visit the project website: https://alexyu.net/pixelnerf
Users
Please
log in to take part in the discussion (add own reviews or comments).