Bayes’ theorem

What is the best way to explain Bayes’ theorem ?

Of course by giving good exemples …

Why Bayes’ therem is so important, and so famous ?

Is it always giving good results ?

What is the best exemple to describe Bayes’ theorem?

Let’s see more details….

We need first the definition:

From Wikipedia,

https://en.wikipedia.org/wiki/Bayes%27_theorem

A and B are events.

P(A) – prior probability of A, without knowing anything about B.

P(B|A) – likelihood

P(B) – prior probability of B, without knowing anything about A, called also marginal probability

or the constant of normalization (we see later why B is called this way…)

https://en.wikipedia.org/wiki/Marginal_distribution

P(A|B) – posterior probability of A, knowing that B occured.

This probabilty is called the conditional probability.

Fot the moment we can just say that A and B are somehow related, but not sure what is the cause and what is the consequence, it depends of every exemple we took.

Both A and B can be a consequence of some third unknown variable at this stage…

Note: P(A|B), the probability of A knowing that B occured is different from P(A)P(B) which is the joint probability of A and B (the probablity of A and B occuring simultaneasly)

Let’s look at some exemples…

a) You met a man who is speaking french. What is a probability that he has the french nationality?

Without any other information (about where you met him, what accent he has etc.) the probability

he has the french nationality is the following:

P(A) – the probability of being french without knowing any other information. If there are

about 7,000,000,000 of habitants on the Earth, and about 60000 french nationals, the probability

being french is 60000/7,000,000,000.

P(B) – the probability of speaking french. French language is spoken in France, Canada, Switzerland, Belgium, and some Africa’s countries. You need to find the exact figures.

P(B|A) – the probability that a person who is a french national speaks french. Let’s give a high probability, 99%.

P(A|B) – the probability that a person who speaks french is a french national

Or simply, P(Nationality|Language)=P(Language|Nationality)*P(Nationality)/P(Language)

= 0,99% x (60000/7,000,000,000)/ 60 000 + part of Canada + part of Switzerland + some Africa’s countries…/7,000,000,000=

0,99% x (60000/7,000,000,000)/(300 000 000/7,000,000,000)

=0,99x 0,0000058/ 0,04

=0,00019

It seems to be a very low probability. But remeber, this is without any other information about the

person.

In this exemple, being french and speeking a french language are correlated variables, and being french is a cause while speaking a french language is a consequence.

Another, frequently mentionned, exemple…

b) You have symptoms of some disease. What is a probability that you have the disease in question?

P(Disease|Symptoms)=P(Symptoms|Disease)*P(Disease)/P(Symptoms)

Let’s say your child have red rashes. What is a probability your child have measles?

If the doctor says: You know what, the measles give red rashes in 99% of cases! (Be careful… he is talking only about the likelihood for the moment…(P(Symptoms|Disease)).

The prior probability of having measles (without knowing anything about red rashes!) is P(Disease).

The doctor could estimate it, depending on the age of your child, whether or not the child took vaccins…etc.

Let’s say it is 30%.

The probability of have red rashes on the skin is P(B), it can be any disease that gives the same symptoms.

Let’s say it is 60%.

So P(Disease|Symptoms)=(0,99 x 0,3)/0,6=0,49

One of the frequently asked questions about the Bayes Theorem is how to remember the formula?

I believe the best way is to remember the exemples, and I gave you 2 simple exemples.

There are several ways to interpret the bayes theorem, for exemple:

P(C|E) = P(E|C)* P(C)/P(E) , C is the cause and E is the effect (not always true !!!, see above, it can be a simple correlation, without causal effect)

P(H|D) =P(D|H) * P(H)/P(D) , H is a hypotesis and D is a data.

A lot of modern theories use the bayes theorem. Most populars are

a) Naive Bayes classifier

b) Bayesian networks & Causality theories

C) Bayesian statistics, Bayesian learning, Bayesian deep learning…

Also, one may ask, why we called P(A) the prior and P(B) the marginal distribution?

We can call both of them prior or marginal, since there are both priors (without knowing anything about the other) and marginals (see the Wikipedia link above to fully understand it), but, if we are looking to calculate P(A|B) then we call P(A) the prior probability and P(B) the marginal probability.

To be continued…

See for more details and more exemples:

https://www.investopedia.com/terms/b/bayes-theorem.asp

https://www.mathsisfun.com/data/bayes-theorem.html

https://plato.stanford.edu/entries/bayes-theorem/

https://corporatefinanceinstitute.com/resources/data-science/bayes-theorem/