Abstract
Naive Bayes is one of the most efficient and effective inductive learning
algorithms for machine learning and data mining. Its competitive
performance in classification is surprising, because the conditional
independence assumption on which it is based, is rarely true in realworld
applications. An open question is: what is the true reason for the
surprisingly good performance of naive Bayes in classification? In
this paper, we propose a novel explanation on the superb classification
performance of naive Bayes. We show that, essentially, the dependence
distribution; i.e., how the local dependence of a node distributes
in each class, evenly or unevenly, and how the local dependencies
of all nodes work together, consistently (supporting a certain classification)
or inconsistently (canceling each other out), plays a crucial role.
Therefore, no matter how strong the dependences among attributes
are, naive Bayes can still be optimal if the dependences distribute
evenly in classes, or if the dependences cancel each other out. We
propose and prove a sufficient and necessary conditions for the optimality
of naive Bayes. Further, we investigate the optimality of naive Bayes
under the Gaussian distribution. We present and prove a sufficient
condition for the optimality of naive Bayes, in which the dependence
between attributes do exist. This provides evidence that dependence
among attributes may cancel out each other. In addition, we explore
when naive Bayes works well.
Links and resources
Tags
community