Abhijeet Mulgund's Personal Webpage

Search

Search IconIcon to open search

Bayes Soft Classifier is Cross-Entropy Minimizer

Last updated Nov 1, 2022

# Statement

Let $X, Y, n$ be a Classification Problem. Then the Likelihood Soft Classifier minimizes Cross-Entropy. It is the unique minimizer up to modifications on $\mathbb{P}$-Null Sets of $\mathcal{D}$.

# Proof

Recall that we can write Cross-Entropy as $$CE(g) = \mathbb{H}(Y|X) + \mathbb{E}\Big[D_{KL}(\mathbb{P}(Y|X) || g(X) )$$

Gibb’s Inequality tells us that $D_{KL}(\mathbb{P}(Y|X) || g(X) )$ is minimized precisely when $g(X) = \mathbb{P}(Y|X)$, which is the Likelihood Soft Classifier. $\blacksquare$