Explain what capacity implies in practice for an error correction code
The capacity is the highest rate of the error correction code whilst maintaining errorless decoding
Capacity
max_pmf I(X;Y)
Conditional Entropy Inequality
“Information Can’t hurt”
H(X | Y) <= H(X)
Chain Rule of Entropy
H(X, Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)
Conditional Entropy
H(Y|X) = SUM_x p(x) H(Y|X=x)
= - SUM_x SUM_y p(x) p(y|x) log p(y|x)
Chain Rule of Conditional Entropy
H(X, Y|Z) = H(Y|Z) + H(X|Y, Z) = H(X|Z) + H(Y|X, Z)
Joint Entropy
H(X, Y) = - SUM_xy p(x,y) log p(x,y)
Relative Entropy (Kullbach-Leiber Distance)
D(p||r) = SUM_x p(x) log p(x)/r(x)
Bayes Theorem
P(B|A) = P(A|B) × P(B)/ P(A)
Conditional Probability
P(B|A) = P(A ∩ B) / P(A)
Total Probability
if {A1,A2, . . . ,Ak } is a partition of Ω,
then for any B ⊂ Ω
P(B) = SUM(i=1 -> k) P(B|A_i)P(A_i)
Mutual Information
I(X;Y) = = H(Y) − H(Y|X)
Entropy
H(X) = - SUM_x p(x) log p(x)
Two key properties of Entropy (value)
Always nonnegative
Only depends on the probabilities in the pmf of the RV
Binomial pmf X ∼ Bi(n, q)
X ∼ Bi(n, q)
p(k) = (nCk) (q^k)(1-q)^n-k for k = 0,1,2,…,n
Joint Distribution
Pr(X = x, Y = y) = p(x, y)
Conditional Distribution
p(y|x) = p(x, y)/ p(x)
Marginalisation
p(x) = SUM_y p(x,y)
Independence of RV
p(x,y) = p(x)p(y) for all x, y
Expectation of a discrete RV
E(X) = SUM_x p(x) x
Expectation of two RVs
E(X+Y) = E(X) + E(Y) unless independent then
E(X+Y) = E(X)E(Y)
Log 1/x =
log(x/y) =
log x - log y
When is entropy maximised and minimised
When p = u, H(X) = log|X|
When p has one probability = 1 and the rest = 0, cardinality > 1