EM has been shown to have favorable convergence properties, automatical satisfaction of constraints, and fast convergence. The next section explains the traditional approach to deriving the EM algorithm and proving its convergence property. Section 3.3 covers the interpretion the EM algorithm as the maximization of two quantities: the entropy and the expectation of complete-data likelihood. Then, the K-means algorithm and the EM algorithm are compared. The conditions under which the EM algorithm is reduced to the K-means are also explained. The discussion in Section 3.4 generalizes the EM algorithm described in Sections 3.2 and 3.3 to problems with partial-data and hidden-state. We refer to this new type of EM as the doubly stochastic EM. Finally, the chapter is concluded in Section 3.5.
C. Schmitz, A. Hotho, R. Jäschke, and G. Stumme. Proceedings of the 3rd European Semantic Web Conference, volume 4011 of LNCS, page 530-544. Budva, Montenegro, Springer, (June 2006)
C. Schmitz, A. Hotho, R. Jäschke, and G. Stumme. Proceedings of the 3rd European Semantic Web Conference, volume 4011 of LNCS, page 530-544. Budva, Montenegro, Springer, (June 2006)
I. Dhillon, S. Mallela, and D. Modha. Proceedings of The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2003), page 89--98. (2003)
I. Dhillon, S. Mallela, and D. Modha. KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, page 89--98. New York, NY, USA, ACM, (2003)
C. Schmitz, A. Hotho, R. Jäschke, and G. Stumme. Proceedings of the 3rd European Semantic Web Conference, volume 4011 of LNCS, page 530-544. Budva, Montenegro, Springer, (June 2006)