Abstract
We consider a high-dimensional mixture of two Gaussians in the noisy regime
where even an oracle knowing the centers of the clusters misclassifies a small
but finite fraction of the points. We provide a rigorous analysis of the
generalization error of regularized convex classifiers, including ridge, hinge
and logistic regression, in the high-dimensional limit where the number $n$ of
samples and their dimension $d$ go to infinity while their ratio is fixed to
$\alpha= n/d$. We discuss surprising effects of the regularization that in some
cases allows to reach the Bayes-optimal performances. We also illustrate the
interpolation peak at low regularization, and analyze the role of the
respective sizes of the two clusters.
Users
Please
log in to take part in the discussion (add own reviews or comments).