Some notes about Entropy

KullbackÔÇôLeibler divergence (Wikipedia): A non-symmetric difference between two distributions:

    \[D_{KL}(P||Q) = \sum_i{P(i)\log\frac{P(i)}{Q(i)}}\]

Conditional Entropy:

    \[H(Y|X) = -\sum_{x}{p(x) \sum_{y}{p(y|x) \log p(y|x)} } = -\sum_{x,y}{p(x,y) \log \frac{p(x)}{p(x,y)}}\]

Joint Entropy:

    \[H(X,Y) = -\sum_{x,y}{p(x,y) \log p(x,y) }\]

    \[H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)\]

A non-symmetric measure of association (“uncertainty coefficient”?) between X and Y measures the percentage of entropy reduced from Y if X is given:

    \[U(Y|X) = \frac{H(Y) - H(Y|X)}{H(Y)} = \frac{I(X;Y)}{H(Y)}\]

Or the percentage of entropy reduced from X if Y is given:

    \[U(X|Y) = \frac{H(X) - H(X|Y)}{H(X)} = \frac{I(X;Y)}{H(X)}\]

Where I(X;Y) = H(X) + H(Y) - H(X,Y) = H(X,Y) - H(X|Y) - H(Y|X), the mutual information of X and Y, is non-negative and symmetric.

Anyway, U(Y|X) equals 0 if no association, 1 if knowing X fully predicts Y (i.e. Y is a function of X).
A symmetric measure can be made of a weighted average of U(Y|X) and U(X|Y):

    \[U(X,Y) &= \frac{H(X)U(X|Y) + H(Y)U(Y|X)}{H(X)+H(Y)}\]

    \[= 2 \Big[ \frac{H(X)+H(Y)-H(X,Y)}{H(X)+H(Y)} \Big]\]

Relation to \chi^2 or measures like Cramer’s V etc.:

    \[\chi^2 = N \cdot \sum_{x,y}{\frac{(p(x,y)-p(x)p(y))^2}{p(x)p(y)}}\]

No obvious relation. Generated some 2×2 contigency tables and plotted their U(X,Y) vs Cramer’s V:

Leave a Reply

Your email address will not be published. Required fields are marked *