May 17, 2005 (Lecture 2)

May 17, 2005 (Lecture 2)

The Three Essential Skills for Software Development

I believe that the three essential skills for software development are abstraction, symbolic representation and manipulation, and interpretation.
What do I mean by each of these?

abstraction: One critical skill required of a software developer is the ability to take a look at a complex problem and reduce it to its essential properties. As software developers we need to decide which properties of the system should be part of our software model, and which properties we can neglect as unimportant.

symbolic representation and manipulation: As software developers, we need to go farther than simply understanding the system. After we have developed an abstract understanding of the system, we must be able to define it symbolically. We need to take our understanding of the system and represent it in a way that is processable by the computer. For example, this semester, we'll need to represent our mental models of various systems in the Java programming language.
We also need the ability to manipulate systems that have already been represented symbolically. We need to be able to look at source code and change it to correct misrepresentations or to reflect changes in our understanding of the system it represents.

interpretation: As we've discussed, as software developers, we need to develop abstract models of systems, represent these systems symbolically, and manipulate them in the symbolic domain. But, this isn't enough. We must also by able to interprete symbolic representations of systems and realize their abstract meaning. This is necessary in order to truly understand what a piece of software does. And, it is also necessary while maintaining software to ensure that the changes being made actually affect the abstract system as intended.
Those with the ability to interprete symbolic systems don't view the system as a collection of symbols. Instead they see through the symbols and view the system that they represent. For example, I view music printed on a sheet of paper and try to "clap it out". A musician can read the printed page and appreciate the music. I am told that the best composers enjoy hearing their work performed, but don't need to hear it to know how it sounds.

Consider this. In the early days of the IS boom in the mid-to-late 1970s and early 1980s, when computer programmers were scarce, IBM developed a solution. Like most, they hired those with degrees in "information systems" and "computer science" (very few CS programs existed at this time).
And, like most, they turned to those who had similar skills in abstraction, symbolic representation and manipulation, and interpretation, and trained them for the job. These folks typically included mathematicians, engineers, physicists, and certain types of business backgrounds.
But, they developed a not-so-secret strategy in the battle for talent. They began hiring musicians -- in large numbers. What IBM understood was that learning to program is easy -- if you have mastered the three essential skills -- and that talented musicians had demonstrated an aptitude for these in a different domain.
I won't say that every punk rocker hired by IBM turned out to be an outstanding programmer. But, their yield was, in fact, high. Certainly programmers need to know a programming language well enough to express ideas freely -- but IBM's experience demonstrates that syntax and symantics are far from the hard part of the problem.
As a footnote, let me add that the first "burst of the bubble" didn't occur recently, with the "Dot Com Bust". The salaries of computer science and IS graduates were depressed through much of the mid-to-late 1980s, and only really recovered in the early 1990s. This was largely due to the rapid growth of CS departments in response to the demand of the late 1970s and early 1980s. Believe it or not, there were more people enrolled in computer science degree and related degree programs in the early 1980s than today! Talk about a glut of people hitting the market!

The Early Days of Programming Education

A couple/three decades ago, when the teaching of computer programming became big business, the landscape was quite different. We were, quite literally, teaching computer programming. Our students were electrical engineers, physicists, mathematicians, business professionals, &c.
Each of these groups of people had masted the three essential skills of abstraction, symbolic manipulation and representation, and interpretation in their own domain. Physicists were already accustomed to considering physical systems, large and small, forming hypotheses, representing the essential properites of these systems mathematically, manipulating the equations, and interpreting the results. Similarly, for electrical engineers, viewing complex electrical and electronic circuits as systems of differential equations was child's play. Mathematicians were already accustomed to generating a symbolic language for the representation of abstract systems with properties of their own creation, manipulating and interpreting these symbolic representations, and interpreting the symbols -- in ways that appear "Greek to the rest of us". And, business people were the inventors of "flow charts" and "entity relationship diagrams". These were tools used to describe business processes and resources.
They came to us, the teachers of computer programming, and asked us to teach them how to program. And, quite literally, that is what they meant, "Show me how to map between my symbolic language and yours." And so, that's what we did. We taught them the syntax and semantics of languages such as FORmula TRANslation (FORTRAN) and the COmmon Business Oriented Language (COBOL). Beyond teaching them our symbolic language, we didn't need to teach them the three essential skills -- they had already mastered them in their own domains. We simply showed them how to convert their symbolic representations into machine-readable representations.

Computer Science Education Today

Today, computers are everywhere. They are no longer tools for specialized scientific, engineering, and business problems. Instead, they are used for all sorts of different things. And so, in effect, a new discipline has emerged. Today "computer scientists" come in all shapes and sizes and address all sorts of problems.
The students, like you, who arrive and say, "I want to learn to program" are asking for something entirely different than your predecessors of 20-30 years ago. You aren't asking to simply be shown the syntax and semantics of Java, or some other programming language, for this is quite insufficient for any purpose.
Unlike your predecessors, you aren't physicists, mathematicians, business people (or musicians). You haven't already mastered the three essential skills in your own domain -- and you want to be able to solve problems across many domains.
So, when you say, "I want to learn to program", you really mean, "Help me to learn everything I need to learn to solve problems using a computer." You want us to teach you not only the sytax and semantics of a language -- but also how to use that language to solve problems. You want us to help you to master the three essential skills as well as teach you the language.
And so, this semester, I will do my best to do just that. And, we'll start exactly there.

Software Design and the Object-Oriented Approach

There are many diffferent approachs to the design of software systems. Over the years, we have developed and taught many different design methodologies. In the past, we began programming courses discussing flow charts and/or top-down diagrams. And sometimes we even discussed the merits of bottom-up approaches. But, these approaches proved too limited for complex problems and a new methodology came to light, object-oriented programming (OOP).
At the heart of the object-oriented approach are objects. Don't expect a complex mathematical description or a long technical definition. You won't find one, at least here. Instead let me just suggest that objects are all of the individual identifiable components of a system. If you want to point at it, name it, use it, or talk about it, it is an object.
One way of discovering the objects in a system is to describe it to someone else. Then consider all of the nouns that you used in your description. These are probably objects.
A very important piece of object-oriented design is, as you might imagine, describing the objects within the system. Again, a conversational model might be useful here. Often times when we describe things to each other, we will talk about the objects first in terms of their behavior. For example, if a little child asks you, "What is a car?" You would first tell the child that it can move forward and backaward and turn. And then, you would tell the child that it can carry people. Eventually you would tell the child that cars come in all different colors, and that some have two doors and others four doors, &c.
This is the same process that we'll use to describe the objects in software systems. After we identify each object, we'll ask ourselves, "What does this object do?" We call the things that an object can do its behaviors. We also ask ourselves, "What do we need to tell it to get it to do these things? What does it need to know?" For example, if we want a car to move forward, we need to tell it, "how fast". If we want it to turn left, we have to tell it, "how much". These are the parameters of its behaviors.
After considering the age old question, "What can the object do for us?" We'll ask ourselves, "What are the other properties of the object?" What color is it? How big is it? How many doors does it have? These aspects of an object we call its attributes.
Many of these attributes are visible attributes, we can observe them from outside of the object. But, some of the attributes we can only discover by inference. We'll call these the hidden attributes.
For example, if we consider a typical calculator, the only number we can see on the display is the most recent input. But, by observing how a calculator adds and subtracts, we can infer that it maintains an accumulator containing the most recent result. We can't see the accumulator, but we know that it must exist -- without it, we cannot explain the behavior of the calculator.

Classes: Different Types of Objects

If I would ask you to describe the contents of a room with only one person inside, you would probably describe that one person, by name. But, if I would ask you to describe a room with 100 people inside, you would probably approach the problem slightly differently.
You would describe a generic person first. You would tell me what all people have in common, and how they differ. Then, you would tell me about each object in the room by first identifying it as a person, and then telling me about how it is special -- by describing its collection of attributes.
In object-oriented languages, we can do exactly the same thing. We can describe types of objects, a.k.a., classes of objects. We do this by writing a class specification that describes the behaviors and attributes of a class of objects. Then, when we create new objects, we do it by building them according to some class specification and "filling in the blanks" for each attribute. So, in object oriented languages, new objects are "instances of a class" meaning that they are built according to some class specification.
We can define a class for cars that observes that cars can have differing numbers of doors, differing sizes of engines, and differing colors, but are all capable of moving forward, backward, and turning left and right. Then, we can create a new car, specifying that it should be be red and have two doors. Sometimes in using the car we might be concerned with the number of doors -- and on other occasions we just might want to drive it forward without worrying about it.

Objects Around The Room

To help you guys with these questions, a small example follows. Let's model something in the real world, a class in one of the 5419 clusters:

What do we have in the class?

People - What kind of people?

Females

Males

Computers - What kind of computers?

Desktops - What kinds of desktops?

Macs

Laptops - What kind of laptops?

Intel PC

iBook

PDAs - What kind of PDAs?

Palm PC

Linux

Walls - What kind of walls?

Fixed

Moveable (partition)

The Object-Oriented Approach

In the example above, everything we listed was a "type of thing." In Java, a type is a classification based on attributes including behaviors. There are few, if any, programming languages that have built-in types for People, Computers and Walls, however. In the case of the example above, we would have to create classes for our things. A class is simply a type that we define that specifies the behaviors and attributes of a collection of things or "objects". In Java, an object is something that is created based on the class "blueprint" and is considered to be an instance of a class. Classes are a great way to decompose a problem and allow you to break a program into smaller, more manageable parts through the use of inheritance and composition.

Inheritance and Composition

So what is inheritance? It's easier to describe what can be done with inheritance rather than what inheritance actually is. In the example above, it is clear that a Laptop is a Computer. Since a Laptop is a Computer, it has everything a Computer has. A Laptop is considered to be a subclass of a Computer. A subclass (also called a child or derived class) has all the attributes and behaviors of its parent class (also called a base class).
So you see, inheritance is a method for defining "is a" relationships. An iBook is a Laptop is a Computer.
Now that we're done with inheritance, we can move on to composition. Again, it's easier to describe what can be done with composition rather than what composition actually is. If we wanted to model a human, chances are we would want to model the different parts that make up a human, such as the heart, the lungs, the brain, etc. A human does not inherit the properties of each of these parts, however. Rather, a human is a larger container that holds all of these smaller parts. When using composition, the idea is to define a bunch of small types, then define a larger type that contains all of the small types.
Unlike inheritance, composition is a method for defining "has a" relationships. A human has a brain. A human has a heart.
*IMPORTANT NOTE: DO NOT confuse inheritance with composition. Sometimes it is difficult to figure out which to use at the code level, but try not to get frustrated. Remember the "is a" and "has a" relationship rules and you should be fine.

The Complete Process

Now that we've covered some more material, we're ready to flesh out our process for solving a problem.

Identify Objects that are part of the system including their behaviors and attributes.
Think of these objects as instances of classes. Try to determine what type of "thing" each object is.
Identify the relationships among the classes. Any 2 classes will have either no relationship, have a subclass/parent class relationship, or will be related through composition.
Specify the behaviors and attributes of each class of objects. If you do this in plain-English, this is what we call writing a spec. If you do this in code, it is programming. In almost all cases it is helpful to write a spec before programming. You will write cleaner, more organized code much faster than you probably would otherwise.
Refactor your decision and reimplement if necessary. Refactoring allows you to move things around in your class hierarchy to better represent your intended design.
Finally, run the program and repair if necessary.

Thinking About Programming

A program models a system and allows for observation of output and performance. That's great, but what do you need to figure out before you start constructing your program? Here's a helpful list of questions to cover before you begin coding:

What problem do you need to solve?

What are the components/properties/behaviors involved in the problem?

How do you model those components/properties/behaviors?

How do you set the system in motion and measure results?

Java Vocabulary and Syntax

The last thing to cover is some simple Java vocabulary, as well some basic syntax. This will be especially helpful for people who have programmed in C++ before.
In Java, instantiating a class essentially means to build an object based on the blueprint providing by your (or Java's) class definition. To do this in Java, we use the "new" command like so: new Person(); When you "new" something in java, you create a reference to that object. A reference is simply a name for an object. A reference variable, however, is a variable capable of holding the name of an object. For example:
If we say: Person p;
In C++, the compiler actually builds a Person object and p is a Person. In Java, p is NOT a Person, but a reference variable capable of holding the name of a Person object. No Person has been created. To create a Person object in Java, you would use the following syntax: Person p = new Person();

A Quick Exercise

Before leaving, we lead the class in our first programming exercises: "Hello World". We entered, compiled, and ran the following code:
  import java.io.*;

  class Program {
    public static void main (String[] args) {
      System.out.println ("Hello world");
    }
  }
  

Compilation, Execution, and the JVM

The first step in the process is to "compile" the program. This means to translate the Java code into a highly specialized set of instructions that actually direct the computer hardware.
In many languages, this compialtion process produces an executible program that runs on the actual hardware. The problem with this approach is that each program must be recompiled for each platform. For example, the smae C++ program would need to be compiled separately for each of PC, Sun, SGI, and Mac computers. This makes the software more complex to deploy and maintain.
Java uses a virtual machine, known as the Java Virtual Machine (JVM). Basically, Sun defined a hardware system and then wrote a program to simulate it. This single program is compiled for each supported architecture. Then, Java programs are compiled for the JVM. As a result, any compiled Java program can run on any JVM -- regardless of the actual, physical host hardware. This is said to be a "compile once run anywhere" system.
So, when you run "javac SomeClass.java", it compiles the Java code to machine code for the JVM. Java calls this machine code the byte code. When you run, "java SomeClass", you are starting ther JVM program and asking it to start the static main() method of the class SomeClass, defined within the file SomeClass.class.
static methods, sometimes known as class methods aren't actually behaviors of instantiated objects. Instead, they are just "things to do" that live within the name space of the class.