Ockham’s Razor: A New Justification, Project Web Page

Participants:

Kevin T. Kelly, Professor, Department of Philosophy, Carnegie Mellon University

Conor Mayo-Wilson, Department of Philosophy, Carnegie Mellon University

Hanti Lin, Department of Philosophy, Carnegie Mellon University

Associates:

Oliver Schulte, Professor, Computer Science, Simon Fraser University

Wei Luo, Computer Science, Simon Fraser University

Introduction

Philosophy of science, statistics, and machine learning all recommend the selection of simple theories or models on the basis of empirical data, where simplicity has something to do with minimizing independent entities, principles, causes, or equational coefficients. This intuitive preference for simplicity is called Ockham's razor, after the fourteenth century theologian and logician William of Ockham.  But in spite of its intuitive appeal, how could Ockham's razor help one find the true theory?  For, in an updated version of Plato's Meno paradox, if we already know that the truth is simple, we don't need Ockham's help. And if we don't already know that the truth is simple, what entitles us to assume that it is?

 

Here is a new answer to that question.  It is hopeless to provide an a priori explanation how simplicity points at the truth immediately, since the truth may depend upon subtle empirical effects that have not yet been observed or even conceived of. The best that Ockham's razor could guarantee a priori is to keep us on the straightest possible path to the truth, allowing for unavoidable twists and turns along the way as new effects are discovered.  In fact, it is possible to define empirical simplicity and efficient convergence to the truth so that Ockham’s razor is the uniquely most efficient strategy for converging to the truth.  That result is called the Ockham efficiency theorem.   This project is devoted to extending, refining, and understanding the significance of the Ockham efficiency theorem. 

 

There is currently no other non-circular explanation of how Ockham’s razor helps one find true theories better than alternative methods.  “Overfitting” explanations of Ockham’s razor have to do with choosing a false theory to obtain more accurate predictions.   Bayesian explanations of Ockham’s razor are based on a circular appeal to a prior bias toward simple possibilities.  And pointing out that simple theories are more testable, explanatory, or unified does not explain why one should assume that the truth has these desirable formal properties. 

The Basic Idea

The following sketch is only a specific illustration of the Ockham efficiency theorem.  Objections should be addressed to the full published versions of the argument in (Kelly 2007b, 2007c, 2008a). 

 

  1. Suppose that the problem is to infer the true degree of a polynomial curve.

 

  1. The degree of a polynomial curve is understood to be n if and only if the curve’s degree n term has a non-zero coefficient and all of the curve’s terms of higher degree have zero coefficients.

 

  1. The data presented for the true curve Y = f(X) are not exact; they consist of increasingly tight open intervals around Y for each specified value of X.

 

  1. Every finite set of such intervals around a curve of degree n is compatible with a curve of degree n + 1. 

 

  1. Ockham’s razor says to conclude the lowest polynomial degree compatible with the intervals presented so far (or to suspend judgment with “?”) and never to suspend judgment again until one’s current answer is no longer simplest.

 

  1. Ockham’s razor converges to the true degree, if there is one, and at worst retracts or takes back its answer n times when the true answer is n (constant curves have degree 0).  Also, each retraction occurs precisely when the previous answer is refuted. 

 

  1. In general, a method maps each finite set of open intervals that might be provided as data to either a guess at the polynomial degree or to “?”.

 

  1. A convergent method converges to the true polynomial degree, whatever it happens to be, as the arbitrarily precise data accumulate.

 

  1. Every convergent method retracts at least n times in some presentation of data for a curve of polynomial degree n the truth is n and these retractions occur at least as late as when the next lower degree is refuted.  For nature can first present intervals around a constant curve, until the convergent method says “degree 0” (else, nature presents complete information about a curve of degree zero and the method fails to converge to the true answer).  At that point, nature can tilt the flat curve to make it properly linear without violating any of the finitely many intervals presented so far and can continue to present ever tighter intervals until the method says “degree 1”, etc.   

 

  1. So Ockham’s razor is efficient in terms of total retractions and the times at which the retractions occur, where efficiency of method M means that for all n, the worst case retraction and retraction time bounds achieved by M over all worlds of polynomial degree n are no greater than the corresponding bounds achieved by an arbitrary convergent method M’

 

  1. Moreover, whatever happened in the past, Ockham’s razor is efficient from that point onward.  For even if Ockham’s razor has already been violated so that getting on the Ockham path demands a retraction, nature can still force that retraction later anyway.  So Ockham’s razor is perfectly efficient, in the game theoretic sense of being efficient no matter what has happened earlier. 

 

  1. Furthermore, for each method that violates Ockham’s razor, the violator is beaten by Ockham at the moment of violation, in the sense that the violator does worse in terms of worst-case timed retractions in each polynomial degree compatible with the data.  For nature can force the violator back to the lowest polynomial degree compatible with the data and, thereafter, through every successive polynomial degree, for the violator produces an extra retraction, in each polynomial degree compatible with the data, compared with a method that hews to the Ockham path from that point onward. 

 

Hence, we have:

Ockham Efficiency Theorem, beta version:

The following statements are equivalent:

 

  1. M is always Ockham.

 

  1. M always converges efficiently to the truth.

 

  1. No convergent method ever beats M.

Refinements to the beta version of the Ockham Efficiency Theorem:

  1. Partially ordered simplicity degrees (Kelly 2007c, 2008a),

 

  1. cumulative errors as a cost (Kelly 2007c),

 

  1. unique definition of simplicity in terms of questions (Kelly 2007c, 2008a),

 

  1. axiomatic (non-unique) approach to simplicity (Kelly 2009c),

 

  1. answers split across simplicity degrees; e.g., “polynomial degree is even”  (Kelly 2007c, 2008a),

 

  1. stochastic methods (Kelly and Mayo-Wilson 2009a),

 

  1. Bayesian and neo-Bayesian methods whose outputs are degrees of belief (Kelly 2009b),

 

  1. Ockham’s razor as a Nash equilibrium (Mayo-Wilson 2009),

 

  1. stochastic data and statistical inference (future work). 

Progress

(Kelly 2002a) presents the first Ockham efficiency theorem, based on a backward-induction ordinal definition of simplicity inspired by the work of Freivalds and Smith on “procrastination learning”.  The detailed formal development is presented in (Kelly 2002b).   (Kelly and Glymour 2004) and (Kelly 2004) contrast the Ockham efficiency theorem with traditional Bayesian philosophy and present it in a philosophical context. 

 

The basic idea in the preceding papers was sound, but the backward-induction definition of simplicity, for all its intimidating mathematical machinery, can’t yet deal with the curve fitting example discussed above, (because empirical complexity has an infinite ascending chain).  A new, game theoretic definition of empirical simplicity is presented in (Kelly 2007d), along with a very elegant Ockham efficiency theorem based on monotone mappings of retraction times between methods.  The basic idea is that more complex theories come later than simpler theories in sequences of theories nature can force an arbitrary, convergent scientific method to produce prior to convergence. 

 

In (Kelly 2007c) a stronger result is obtained by comparing worst-case bounds within complexity classes and by refining the definition of empirical simplicity.  (Kelly 2007c) presents an even more general game theoretic definition of simplicity.   (Kelly 2007a, 2007b, 2008a) summarize the approach adopted in (Kelly 2008a) and relate it to issues in statistics and philosophy of science. 

Ongoing Research

Ockham’s razor in causal discovery

One exciting application area that depends heavily on Ockham’s razor is causal inference from correlational data.  In causal inference the theories matter for prediction---getting a causal arrow backwards can throw predictions way off.  It is shown (Kelly and Mayo-Wilson 2009b) that any method that converges to true causal structure in linear causal models can be force to flip a given causal arrow in its conclusions any number of times.  Hence, causal inference from correlational data cannot be reliable.  But the usual Ockham techniques do converge efficiently to the truth in the sense described above.  So the Ockham efficiency theorem is the only non-circular foundation on the books for this important application.

Axiomatic, non-unique simplicity

From the outset, the idea was to define simplicity as uniquely as possible.  But that idea now appears to have been a mistake: forcing a unique simplicity ranking on worlds when the problem does not do so has resulted in weaker Ockham efficiency theorems than necessary.  Also, it seems philosophically wrong to impose structure where the question as presented lacks structure.   The manuscript (Kelly 2009c) adopts a looser approach, defining what counts as a simplicity concept and countenancing a multiplicity of simplicity concepts in questions lacking the robust simplicity structure exhibited by the curve fitting problem.

 

Ockham efficiency extended to stochastic scientific methods

The Ockham efficiency theorems published thus far show that a deterministic version of Ockham’s razor is more efficient than all deterministic competitors.  But is a familiar fact in game theory that one can often improve worst-case performance by adopting a random strategy (as in rock-paper-scissors). The manuscript (Kelly and Mayo-Wilson 2009a) shows that the deterministic versions of Ockham’s razor are, indeed, efficient against arbitrary stochastic strategies with discrete states.  Indeed, only trivial variants of the deterministic Ockham strategy are efficient.  The result is entirely parallel to the deterministic Ockham efficiency theorem.  Surprisingly, no independence assumptions are required. 

Ockham’s razor as a game-theoretic equilibrium

During the course of the work on stochastic methods, analogies between the Ockham efficiency theorem and standard game theory became more apparent, raising the natural question whether the theorem is an application of Nash’s theorem for two-person zero-sum games.  In (Mayo-Wilson 2009), it is shown that Nash’s assumptions are violated by the theorem but, nonetheless, Ockham’s razor can be portrayed as a unique Nash equilibrium of a generalized sort.  This result will help to position the Ockham efficiency theorem with respect to standard theories of rationality. 

Ockham’s razor for Bayesians

The Ockham efficiency theorems are stated in terms of theory choice, raising the question how they apply when methods adopt degrees of belief over theories.  The story is more interesting than expected (Kelly 2009b).  Belief profiles over potential answers to a question can be viewed as vectors in Euclidean space and convergence can be viewed as motion through space.  Ockham efficiency can then be viewed as minimization of spatial distance traveled and it turns out that only methods that maintain probabilistic coherence at each stage of inquiry are efficient.  But this result turns out to require a strong Ockham bias even prior to seeing any data.   A more plausible result allowing for initial suspension of judgment results if one measures total drops in credence rather than total distance traveled.  But then coherence is inefficient.  Methods employing sub-additive or imprecise probabilities, on the other hand, can be retraction efficient.  Moreover, the mysterious phenomenon of “dilation” or decreased probabilistic precision in light of increasing information turns out to be necessary for efficiency.  This study provides an entirely new perspective on Bayesian and neo-Bayesian rationality. 

Ockham’s razor in statistical inference.

The ultimate goal of the project is to lift the Ockham efficiency theorem one more time from stochastic methods receiving deterministic data (Kelly and Mayo-Wilson 2009a) to stochastic methods receiving stochastic data (statistical theory choice).  That work remains for part 2 of the grant project.

Online Tutorials

Seth Casana (2005) Animated tutorial on Ockham Efficiency.

 

Kevin T. Kelly (2007) Power point lecture. “Simplicity and Truth: an Alternative Explanation of Ockham's Razor”,  keynote address, 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'07), Birmingham, UK, Fall 2007. 

Papers

Kevin T. Kelly (2002b) “A Close Shave with Realism: Ockham's Razor Derived from Efficient Convergence”,  manuscript.

 

Kevin T. Kelly (2002a) “Efficient Convergence Implies Ockham's Razor”, Proceedings of the 2002 International Workshop on Computational Models of Scientific Reasoning and Applications, Las Vegas, USA, June 24-27.

 

Kevin T. Kelly and Clark Glymour (2004) “Why Probability Does Not Capture the Logic of Scientific Justification”, in Christopher Hitchcock, ed., Contemporary Debates in the Philosophy of Science, London: Blackwell.

 

Kevin T. Kelly (2004) "Justification as Truth-finding Efficiency: How Ockham's Razor Works", Minds and Machines 14: 485-505.

 

Kevin T. Kelly (2007a) “A New Solution to the Puzzle of Simplicity”,

Philosophy of Science 74: 561-573.

 

Kevin T. Kelly (2007b) “Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency”, Theoretical Computer Science, 383: 270-289.

 

Kevin T. Kelly (2007c) “How Simplicity Helps You Find the Truth Without Pointing at it”,in Philosophy of Mathematics and Induction, V. Harazinov, M. Friend, and N. Goethe, Dordrecht: Springer.

 

Kevin T. Kelly (2007d) “Simplicity, Truth, and the Unending Game of Science”,  in Infinite Games: Foundations of the Formal Sciences V,  S. Bold,

B. Löwe, T. Räsch, and J. van Benthem eds, Roskilde: College Press, pp. 223-270.

 

Kevin T. Kelly (2008a) “Ockham’s Razor, Truth, and Information”, in Handbook of the Philosophy of Information, J. van Bethem and P. Adriaans, eds., Dordrecht: Elsevier.

 

Kevin T. Kelly (2008b) “Five Answers”, in Epistemology: 5 Questions, V. Hendricks and D. Pritchard, eds., Copenhagen: Automatic Press.

 

Kevin T. Kelly (2009a) “Argument, Inquiry, and the Unity of Science”, forthcoming in Sciences and Methods, Bijoy Mukherjee, ed., Kolkata: Asiatic Society. 

 

Kevin T. Kelly (2009b) “Ockham’s Razor, Hume’s Problem, Ellsberg’s Paradox, Dilation, and Optimal Truth Conduciveness”, under review at Synthese.  

 

Kevin T. Kelly and Conor Mayo-Wilson (2009a) “Ockham Efficiency Theorem for Random Empirical Methods”, completed draft subject to revision, comments welcome.

 

Kevin T. Kelly (2009c) “A Topological Theory of Empirical Simplicity and its Connection to the Truth”, in preparation, comments welcome.

 

Kevin T. Kelly and Conor Mayo-Wilson (2009b) “Causal Discovery, Causal Retractions, and Their Minimization”, in preparation, comments welcome.

 

Conor Mayo-Wilson (2009) “Ockham's Shaky Razor: Efficient Convergence by Random Methods”, in preparation, master’s thesis Department of Philosophy, Carnegie Mellon University.

Public Lectures

Kevin T. Kelly (2001) “Simplicity Deduced from Efficient Convergence”, 40th

Anniversary Conference, Center for Philosophy of Science, University of Pittsburgh.

 

Kevin T. Kelly (2002) “Efficient Convergence Implies Ockham's Razor”,

International Workshop on Computational Models of Scientific Reasoning and Applications, Las Vegas, USA.

 

Kevin T. Kelly (2004) “How Ockham's Razor Helps You Find the Truth”, Department of Logic and Philosophy of Science, U.C. Irvine.

 

Kevin T. Kelly (2004) “Ockham's Razor”, Center for Philosophy of Science, University of Pittsburgh.

 

Kevin T. Kelly (2004) “Ockham's Razor, Efficiency, and the Infinite Game of Science”, Plenary Lecture, Foundations of the Formal Sciences 2004: Infinite Game Theory, Bonn, Germany

 

Kevin T. Kelly (2005) “Learning, Simplicity, Truth, and Misinformation”, invited presentation, International Workshop on the Philosophy of Information, Amsterdam, Netherlands, Spring 2005.

 

Kevin T. Kelly (2005) “Ockham, Complexity, and Truth”, American Mathematical Society, Santa Barbara.

 

Kevin T. Kelly (2005) “Ockham's Razor: What it is, What it isn't, How it Works and How it Doesn't.''  Symposium on Logic in the Humanities, Stanford University, Spring 2006.

 

Kevin T. Kelly (2005) “Ockham's Razor:  What it is, what it isn't, how it works, and how it doesn't”, plenary tutorial, Second Annual Formal Epistemology Workshop, University of Texas, AustinPower point lecture.

 

Kevin T. Kelly (2006) “Philosophical Logic and Reliability”, Philosophical Logic Symposium, Carnegie Mellon.

 

Kevin T. Kelly (2006) “A New Solution to the Puzzle of Simplicity”, Philosophy of Science association biennial meeting, Vancouver.

 

Kevin T. Kelly (2007) “Ockham’s Razor Without Circles, Evasions, or Magic'', Formal Epistemology Workshop, Pittsburgh.

 

Kevin T. Kelly (2007) “Truth-conduciveness Without Reliability: A Non-Theological Explanation of Ockham’s Razor”, Working Group in History and Philosophy of Logic, Mathematics, and Science, University of California, Berkeley.

 

Kevin T. Kelly (2007) “Simplicity and Truth: an Alternative Explanation of Ockham's Razor”,  Keynote address, 8th International Conference on Intelligent Data

Engineering and Automated Learning (IDEAL'07), Birmingham, UK, Fall 2007.  Power point lecture.

 

Kevin T. Kelly (2007) “Simplicity and Truth”, Department of Philosophy, University of Jaipur, India.

 

Kevin T. Kelly (2008) “Ockham’s Razor in Causal Discovery: A New Explanation”, National Institute of Science, Technology, and Development Studies, University of Delhi, India.

 

Kevin T. Kelly (2008) “Simplicity, Truth, and Causation: A New Explanation of Ockham’s Razor”, Platinum Anniversary Lecture on Causation, Indian Statistical Institute, Kolkata, India.

 

Kevin T. Kelly (2008) “Unity of Science Without Dogma”, Keynote Lecture, Conference on Sciences and Methods, Asiatic Society, Kolkata, India.

 

Kevin T. Kelly (2008) “Relations of Ideas are Matters of Fact: A Unified Theory of Theoretical Unification”, Ideals of Proof workshop, Nancy, France.

 

Kevin T. Kelly (2008) “Ockham’s Razor Without Circles, Evasions, or Magic”,  Department of History and Philosophy of the Sciences, Sorbonne, Paris, France.

 

Conor Mayo-Wilson (2008) “Theoretical Virtues and the Repeated Game of Science”,  10th annual Rocky Mountain Philosophy Conference.

 

Conor Mayo-Wilson (2008) “Mixed Strategies in Formal Learning Theory and Ockham's Razor”,  Decisions, Games, and Logic 08, Institute for Logic, Language, and Computation, Amsterdam.  

Graduate Seminars

80-300 Simplicity, Carnegie Mellon, Department of Philosophy, Fall 2008.

Grant Support

NSF grant number:  0740681. 

Proposal Title: Ockham's Razor: A New Justification
Principal Investigator: Kevin Kelly
Staff:  
Conor Mayo-Wilson, 2008
Hanti Lin, 2009
Performing Organization: Carnegie Mellon University
NSF Division: Division of Social and Economic Sciences
NSF Program: History and Philosophy of Science Engineering and Technology
Program Officer: Dr. Frederick Mark Kronz