Abstract
When we observe a set of data points, we usually adopt a model with
a finite number of parameters and fit the parameters optimally so as
to explain the data. Here, it is known that the best performance is
achieved for an optimal number of the parameters: The more complicated
data are observed, the more parameters are needed, and vice versa.
The value of the optimal number is predicted by `information criteria'
such as Akaike Information Criteria (AIC), Bayesian Information Criteria
(BIC), and so on.
On the other hand, the best performance is achieved by parameter tuning,
when the data are produced from the same model as we hypothesized.
We can tune the parameters of the model, according to the prescription
such as the maximization of marginal likelihood.
Both optimizations above appear to be different from each other. But they
can be understood from a single point of view of `parameter scaling', which
was shown in a pioneering work by Nemenman and Bialek 1 for
density estimation problem.
In this presentation, we show that the above two kinds of optimization are
described in a unified form by a new type of information criteria related to
the renormalization group equation. In the context of non-parametric models
(e.g. 2), we make clear the notions and roles of renormalization and
renormalization group in Bayesian statistical inference, which lead to the
improvement of learning performance of a broader class of models.
References:\\
1) I. Nemenman and W. Bialek, Phys.Rev.E. 65 (2002) 026137.\\
2) W. Bialek, C.G. Callan and S.P. Strong, Phys.Rev.Lett. 77 (1996) 4693.
Users
Please
log in to take part in the discussion (add own reviews or comments).