# Assessing Fake News: An Information Theoretic Approach

In 1948 Claude Shannon published a landmark paper that gave rise to a new field of science: Information theory. A few years ago I also published my not-so groundbreaking post on how we can make inferences from biased sources of information. I’m going to follow up that post by assessing the quality of a news source by using one of the key insights of Shannon’s research.

Let’s say you have a random source of information. The probability that it outputs a given message $x$ is given by $p(x)$. Furthermore, let’s say we wish to construction a function, $s(x)$, that indicates how surprising a given message is. How might you want to construct such a function? An intuitive approach might be to give a few constraints on things we want from $s(x)$.

1. $s(x)$ decreases as $p(x)$ increases. The more likely an event, the less surprsing it is.
2. If $p(x)=1$, then $s(x)=0$. A event that is certain should yield no surprise.
3. As $p(x)\rightarrow0$, then $s(x)\rightarrow\infty$. The surprise of an message knows no bounds.

One such function that satisfies these conditions is $s(x) = \log \frac{1}{p(x)}$.

From here we can go a step further and measure the average surprise of a source (aka the Shannon Entropy) given by $H(X) = E[s(X)] = \sum p(x) \log \frac{1}{p(x)}$

If we take this formula in its most literal sense it seems to reinforce our own intuitions about the quality of a news source. If a news source is always pro or anti one side or the other then it’s Shannon entropy is 0. i.e. There is no information to be gleaned from the signal. But if it occasionally surprises us then $H(X)>0$. In fact, $H(X)$ reach its maximum when all messages are equally likely.

Do you buy this literal interpretation of Shannon’s equations? If not, do you think it can be adjusted somehow?