15-312 Principles of Programming Languages

Why Study Programming Languages?

This is a course on the principles of programming languages. Why study these principles? Because they are fundamental to the design, implementation, and application of programming languages.

Programming language design is often regarded as largely, or even entirely, a matter of opinion, with few, if any, organizing principles, and no generally accepted facts. Dozens of languages are in everyday use in research laboratories and in industry, each with its adherents and detractors. The relative merits of languages are debated endlessly, but always, it seems, with an inconclusive outcome. Some would even suggest that all languages are equivalent, the only difference being a matter of personal taste. Yet it is obvious that programming languages do matter!

Yet can we really say that Java is “better” (or “worse”) than C++? Is Scheme “better” than Lisp? Is ML “better” than either of them? Can we hope to give substance to any of these questions? Or should we simply reserve them for late night bull sessions over a glass of beer? While there is certainly an irreducible subjective element in programming language design, there is also a rigorous scientific theory of programming languages that provides a framework for posing, and sometimes answering, such questions. To be sure there are good questions for which current theory offers no solutions, but surprisingly many issues are amenable to a rigorous analysis, providing definite answers to many questions. Programming language theory liberates us from the tar pit of personal opinion, and elevates us to the level of respectable scientific discourse.

Programming language theory is fundamental to the implementation of programming languages, as well as their design. While compiler writers have long drawn on the theory of grammars for parsing and on graph theory for register allocation, the methods used to compile well-known languages such as C do not rely on deep results from programming language theory. For relatively simple languages, relatively simple compilation methods suffice. But as languages become more sophisticated, so must more sophisticated methods be employed to compile them.

For example, some programs can be made substantially more efficient if code generation is deferred until some run-time data is available. A tight inner loop might be “unrolled” into a linear instruction sequence once the iteration bound is determined. This is one example of partial evaluation, a technique for program specialization that rests on results from programming language theory. To take another example, modern languages such as ML (and proposed extensions of Java) include what are known as parameterized types to support flexible code re-use. Parameterized types complicate compilers considerably because they must account for situations in which the type of a variable or function argument is not known at compile time. The most effective methods for handling parameterized types rely on typed intermediate languages with quite sophisticated type systems. Here again programming language theory provides the foundation for building such compilers.

Programming language theory has many applications to programming practice. For example, “little languages” arise frequently in software systems --- command languages, scripting languages, configuration files, mark-up languages, and so on. All too often the basic principles of programming languages are neglected in their design, with all too familiar results. After all, the argument goes, these are “just” scripting languages, or “just” mark-up languages, why bother too much about them? One reason is that what starts out as “just” an ad hoc little language often grows into much more than that, to the point that it is, or ought to be, a fully-fledged language in its own right. Programming language theory can serve as a guide to the design and implementation of special purpose, as well as general purpose, languages.

Another application of the theory of programming languages is to provide a rigorous foundation for software engineering. Formal methods for software engineering are grounded in the theory of specification and verification. A specification is a logical formula describing the intended behavior of a program. There are all kinds of specifications, ranging from simple typing conditions (“the result is a floating point number between 0 and 1”) to complex invariants governing shared variables in a concurrent program. Verification is the process of checking that the implementation indeed satisfies the specification. Much work has gone into the development of tools for specifying and verifying programs. Programming language theory makes precise the connection between the code and its specification, and provides the basis for constructing tools for program analysis.

The theory of programming languages provides a “reality check” on programming methodology, that part of software engineering concerned with the codification of successful approaches to software development. For example, the merits of object-oriented programming for software development are well known and widely touted. Object-oriented methodology relies heavily the notions of subtyping and inheritance. In many accounts these two notions are confused, or even conflated into one concept, apparently because both are concerned with the idea of one class being an enrichment of another. But careful analysis reveals that the two concepts are, and must be, distinct: confusing them leads to programs that violate abstraction boundaries or even incur run-time faults.

The purpose of the this course is to introduce the basic principles, methods, and results of programming languages to undergraduate students who have completed the introductory sequence in computer science at Carnegie Mellon.  I intend for students to develop an appreciation for the benefits (and limitations) of the rigorous analysis of programming concepts.

Author: Robert Harper
Last modified: Fri Dec 9 23:39:30 EST 2016