In 1948 Claude Shannon published a landmark paper that gave rise to a new field of science: Information theory. A few years ago I also published my not-so groundbreaking post on how we can make inferences from biased sources of information. I’m going to follow up that post by assessing the quality of a news source by using one of the key insights of Shannon’s research.

Let’s say you have a random source of information. The probability that it outputs a given message is given by . Furthermore, let’s say we wish to construction a function, , that indicates how surprising a given message is. How might you want to construction such a function? An intuitive approach might be to give a few constraints on things we want from .

- decreases as increases. The more likely an event, the less surprsing it is.
- If , then . A certain event should yield no surprise.
- As , then . The surprise of an event knows no bounds.

One such function that satisfies these conditions is

.

From here we can go a step further and measure the *average* surprise of a source (aka the Shannon Entropy) given by

If we take this formula in its most literal sense it seems to reinforce our own intuitions about the quality of a news source. If a news source is always pro or anti one side or the other then it’s Shannon entropy is 0. i.e. There is no information to be gleaned from the signal. But if it occasionally surprises us then . In fact, reach its maximum when all messages are equally likely.

Do you buy this literal interpretation of Shannon’s equations? If not, do you think it can be adjusted somehow?