$T\mathcal{M}$

There is a connection somewhere

Feb 22, 2017

A note on Bayes' theorem

Thomas Kuhn discussed the notion of paradigms as things that every expert in a field assumes to be true. He then proceeded to study the overturning of these paradigms, scientific revolutions. An example of a paradigm would be the believe of 19th century physicists that every wave needs a medium, which was the key assumption of the ether theory. At the time this was a reasonable believe, every wave studied in detail so far had a medium, air for sound, water for tidal waves and perhaps ropes for demonstration experiments, but light was different. Or perhaps not, it does not take too much redefinition to claim that an unexcited electromagnetic field constitutes a medium. It would certainly not be the medium 19th century physicists had in mind, but with that redefinition we change the claim "19th century physicists were mistaken about the existence of a medium." to "19th century physicists were mistaken about the nature of a medium."

Interestingly a similar thing happens with Bayes' theorem: Let $(X, \mu)$ be a measurable space, $G \subset X$ a subset with finite measure and $A, B \subset G$. Then we can interpret $P(A)=\frac{\mu(A)}{\mu(G)}$ as a probability measure on $G$ and specifically

$$ P(A)=\frac{\mu(A)}{\mu(G)} =\frac{\mu(A\cap G)}{\mu(G)} = P(A|G). $$

Let now $F\subset X$, $\mu(F)$ finite and $A\subset F$. Then

$$ P(B|A)=\frac{P(A\cap B)}{P(A)}\frac{P(B)}{P(B)}\\ =\frac{\mu(A\cap B)}{\mu(A)} \frac{\mu(B)}{\mu(G)} \frac{\mu(G)}{\mu(B)} \left(\frac{\mu(F)}{\mu(F)}\right)^2\\ =P(A|B)\frac{\mu(A)}{\mu(B)} \frac{\mu(F)}{\mu(F)}\\ = P(A|B)\frac{\tilde{P}(B)}{P(A|F)} $$ with $\tilde{P}(B)=\mu(B)/\mu(F)$, for which $\tilde{P}(B)=P(B)$ if $B\subset F$.

This is concerning for two reasons, first there is no relationship between $B$ and $F$. So in case of 'obviously' $F$, one assumes that $B=B\cap F$ and therefore assumes the standard form of Bayes' theorem. On the other hand, even if $B\subset F$, we can not choose between $F$ and $G$. This is the case with the difference between existence and nature of a medium above, it seems that there are interpretive systems that are too strong to be empirically accessible.

The Persian King of King Xerxes had the Hellespont whipped and cursed, after a bridge collapsed. Obviously ridiculous, but the second bridge did hold.

I am not entirely sure, what to make of these. Probably the most conservative interpretation is, a model does not necessarily hold at points were it was not tested. A bit more speculative, we can add believes as long as they do not interfere (too much) with the actual workings of reality.

Post page

Jan 25, 2017

A proof of Bayes' theorem

A useful way to think about probabilities is as a formalization of Venn Diagrams on measureable spaces. Here I illustrate this point with a proof of Bayes' theorem.

A measurable space is a set $X$ equiped with a measure $\mu$, that is a function from the power set of $X$ to the reals,

$$ \mu : P(X) \rightarrow \mathbb{R} \\ $$

such that

1) It is non-negative $\mu(A) \ge 0$ for all subsets $A\in P(X)$

2) The measure of the empty set vanishes: $\mu(\emptyset)=0$

3) Countable additive, for all countable collections $\lbrace A_{i\in I} \rbrace$ of pairwise disjoint sets

$$ \mu\left(\bigcup\limits_{i\in I} A_i\right) = \sum_{i\in I} \mu(A_i) $$

This definition basically abstracts the notion of an area, an area is never negative, zero only for lower dimensional objects, that is points, lines and the empty set, and if I take two non overlaping shapes, such as one table and another table, then I can add the areas. Note that there can be sets that have a measure of $0$, but are not empty.

As an excercise, one can proof directly that

$$ \mu(A\cup B)=\mu(A)+\mu(B)-\mu(A\cap B) $$

by considering the disjoint sets $A\setminus B$, $B\setminus A$ and $A\cap B$ and noticing that $A\cap B$ is counted twice. Compare this on the Venn diagram below.

2 set Venn diagram

We need one extra ingredient for probabilities, the entire set should have measure $\mu(X)=1$, which we can always archive by using $P(A)=\frac{\mu(A)}{\mu(X)}$, if the measure of the entire space is finite. (We can directly check that for such an metric space the Kolmogorov axioms of probability theory hold, and vice versa.)

To gain some intuition, imagine a barn with a target attached. When one throws a dart at the barn the probability to hit the target is $\mu(target)/\mu(barn)$, at least assuming that the dart hits the barn. This motivates the notion of conditional probability. A conditional probability means, that we already know that we hit the barn, what is the probability that we hit the target? Or formally $$ P(A|B)=\frac{P(A\cap B)}{P(B)} $$

which basically states, that if we already know that an event $B$ happens, for example the dart hits the barn, what is now the probability that also $A$ happens. In the illustration above, we already know, that we are in the blue disk and we wonder about the probability, that we hit the shaded area $A\cap B$. More abstractly, we restrict our measurable space from $X$ to $A \subset X$, and the measure to the powerset of $A$.

With this we can now proof Bayes' theorem, for two not impossible events $P(A)\neq 0$, $P(B)\neq 0$ we have the conditional probabilities $$ P(A|B)=\frac{P(A\cap B)}{P(B)} $$ $$ P(B|A)=\frac{P(B\cap A)}{P(A)} $$ And eleminating $P(A\cap B)$ we get the famous formula $$ P(A|B)=\frac{P(B|A)}{P(B)}P(A) $$ which in words say, that the probability that $A$ happens, if we already know that $B$ happens, is the same as the probability that $B$ happens, if we already know that $A$ happens times the ratio of the probabilities $A$ and $B$.

In Bayesian interpretation of probability one considers the probability of an hypothesis $H$ as being true as $P(H)$, then we can use Bayes' theorem to incorporate new information into the hypothesis by using $$ P(H|E)=\frac{P(E|H)}{P(E)} P(H) $$ where $P(H)$ is the probability that the hypothesis is true, $P(E|H)$ is the probability of $E$ given $H$ and $P(E)$ is the overall probability of $E$. The trick here is, that the right hand side is all known information and we can calculate the left hand side. In particular $P(E|H)$ can be calculated from the hypothesis theoretically, since it assumes that the hypothesis is true.

Post page