Introduction to Bayesian Methodology

Kevin T. Kelly
Department of Philosophy
Carnegie Mellon University


Mathematical probability theory

Think of propositions as sets of possible states of the world. Thus, "the sky is blue" picks out all world states in which the color of the sky is blue. Some of these world states will have houses and cars and others will not.


Bayesian methodology:

A rational agent whose degrees of belief are represented by probability function P should update her degrees of belief to P(.|e) after observing e.

By Bayes' theorem, the new degree of belief in h after seeing e is

This formula is so important that the individual parts have special, time-honored names: In order to ameliorate the problem of assigning P(e), pairwise comparisons of theories are made by looking at ratios of posterior probabilities:
P(h|e)/P(h'|e) = [P(h)/P(h')][P(e|h)P(e|h')].
The ratio [P(h)/P(h')] is the prior ratio and the ratio [P(e|h)P(e|h')] is the likelihood ratio.  Changes in relative probability between competing theories are governed entirely by the likelihood ratio, since the prior ratio is a fixed constant.
 


Some methodological consequences:

High initial plausibility is good, explanations being similar: P(h1|e)/P(h2|e) = [P(h1)/P(h2)][P(e|h1)/P(e|h2)].

Refutation is fatal: If consistent e is inconsistent with h, then P(h|e) = 0.

Surprising predictions are good, initial plausibilities being similar: If h entails e, then P(h & e) = P(h), so P(h|e) = P(h)/P(e), which is greater insofar as P(e) is lower (i.e., the occurrence of e is more surprising).

Diminishing returns of repeated testing:  Once P(e) is expected, by the preceding argument confirmation is reduced.

Strong explanations are good, initial plausibilities being similar: The ratio P(h|e)/P(h'|e) changes through time entirely as a function of the ratio of relative strength of explanation P(e|h)/P(e|h'), for

P(h1|e)/P(h2|e) = [P(h1)/P(h2)][P(e|h1)/P(e|h2)] = k[P(e|h1)/P(e|h2)].
Unification is good, initial plausibilities being similar: A unified theory explains some regularity that the disunified theory does not. For example, Copernicus' theory entails that the total number of years must equal the total number of synodic periods + the total number of periods of revolution. To see this, suppose that data e, e' are independent a priori, so
P(e & e') = P(e)P(e').
Now suppose that e and e' remain independent given h1 but are completely dependent given h2 so that
h & e' --> e.  Thus
P(e & e'|h1) = P(e'|h1)P(e|h1) and
P(e & e'|h2) = P(e|h2).
So
P(h1|e & e')/P(h2|e & e') =
[P(h1)/P(h2)][P(e & e'|h1)/P(e & e'|h2)] =
k[P(e & e'|h1)/P(e & e'|h2)] =
k[P(e|h1)P(e'|h1)/P(e|h2)].
Now there is no reason to suppose that P(e|h1), P(e'|h1), and P(e|h2) are high, so the disunified theory has to overcome the effect of a product of low numbers while the unified theory does not. The more disunified phenomena a theory unifies compared to a competitor, the bigger this advantage becomes (suppose the likelihoods are all less that .5.  Then the degree of belief drops exponentially in the number of disunified phenomena.

Saying more lowers probability: h entails h' ==> P(h) < P(h').

Conflict turns explanatory strength into an asset: Didn't we just say that strong explanations are good??? That is true if the initial plausibilities are similar. But if one theory entails the other, they won't be. Thus, unification-style arguments only work if the competing theories are mutually contradictory!


Some defeasible objections

Scientific method should be objective. The method is objective. Everybody is supposed to update by calcluating personal probabilities. Some of the inputs to this method (prior probabilities) are not objective.

Scientific method should not consider subjective, prior plausibilities. That's just the kind of blind, pre-paradigm science Kuhn ridicules as being sterile. Without prior plausibilities to guide inquiry, no useful experiments would ever be performed.

Priors should be flat. What is flat? If we are uncertain about the size of a cube, should we be indifferent about

Whichever one we are unbiased about, we are strongly biased about the others!


Some more stubborn objections

High posterior probability doesn't mean that the theory is true. To some extent, one can show that the agent must believe that she will converge to the truth. But this doesn't mean that she will.

It isn't clear that numbers like P(e) even exist. One can respond with a protocol for eliciting such numbers, but in practice it doesn't always work. One can say that the subjects are "irrational", but the audience can always blame Bayesianism instead of the subjects.

The old evidence problem. If e is already known, then P(h|e) = P(h) P(e|h)/P(e) = P(h). So old evidence never "confirms" a hypothesis.

Responses: