Return to lecture notes index
September 3, 2008 (Recitation 1)


System Calls

You have probably already heard the term "System Call." Do you know what it means? As its name implies, a system call is a "call", that is, a transfer of control from one instruction to a distant instruction. A system call is different from a regular procedure call in that the callee is executed in a privileged state, i.e, that the callee is within the operating system.

Because, for security and sanity, calls into the operating system must be carefully controlled, there is a well-defined and limited set of system calls. This restriction is enforced by the hardware through trap vectors: only those OS addresses entered, at boot time, into the trap (interrupt) vector are valid destinations of a system call. Thus, a system call is a call that trespasses a protection boundary in a controlled manner.

Since the process abstraction is maintained by the OS, xsh will need to make calls into the OS in order to control its child processes. These calls are system calls. In UNIX, you can distinguish system calls from user-level library (programmer's API) calls because system calls appear in section 2 of the ``manual'', whereas user-level calls appear in section 3 of the ``manual''. The ``manual'' is, in UNIX, what you get when you use the ``man'' command. For example, man fork will get you the ``man page'' in section 2 of the manual that describes the fork() syscall, and man -s 2 exec will get you the ``man page'' that describes the family of ``exec'' syscalls (a syscall, hence -s 2.)

The following UNIX syscalls may prove to be especially useful in your solution to this project. There are plenty of others, so you may find "man" and good reference books useful, especially if you are new to system programming.

Process Creation

To create a new process we use the fork() system call. The fork system call actually clones the calling process, with very few differences. The clone has a different process id (PID) and parent process id (PPID). There are some other minor differences, see the man page for details.

The return value of the fork() is the only way that the process can tell if it is the parent or the child (the child is the new one). The fork returns the PID of the child to the parent and 0 to the child. This subtle difference allows the two separate processes to take two different paths, if necessary. The wait_() family of functions allows a parent process to wait for a child process to complete.

It is important to note that the wait_() family of functions returns any time the child changes status -- not just when it rolls over or exits. Many status changes you may want to ignore. You may also want to take a look at some of the flags in the man page for waitpid(), you may find WNOHANG, and others helpful. (WNOHANG makes the wait non-blocking, if there's no news -- it just lets you collect information, if available)

The following example shows a waitpid(). It waits for a specific child. wait() will wait for any child. There are several other flavors. We'll discuss more about what the execve() within the child does shortly.

    int main(int argc, char *argv[])
      int status;
      int pid;
      char *prog_arv[4];

      /* Build argument list */

      prog_argv[0] = "/usr/local/bin/ls";
      prog_argv[1] = "-l";
      prog_argv[2] = "/";
      prog_argv[3] = NULL;

       * Create a process space for the ls  
      if ((pid=fork()) < 0)
        perror ("Fork failed");

      if (!pid)
        /* This is the child, so execute the ls */ 
        execvp (prog_argv[0], prog_argv);

      if (pid)
         * We're in the parent; let's wait for the child to finish
        waitpid (pid, NULL, 0);

It is important for a parent to wait for the children that it creates. This can either be done in a blocking fashion for foreground processes, or in a non-blocking fashion (WNOHANG) when the child signals. Although many of the resources composing a process are freed when it dies, the process control block(PCB), or at least some of its information, is not. The PCB contains status information that the parent can collect via wait_(). A process that is in this state is called defunct. After the wait_(), the PCB is freed. If the parent dies before the child, the child is reparented to the init() process which will perform a wait_() for any such process, allowing the PCB to be freed. Orphan process that are waiting for init to clean them up are called zombies.

What If I Don't Want A Clone?

The exec_() family of calls allows a process to substitute another program for itself. Typically a program will call fork() to generate a duplicate copy of itself and the child will call an exec_() function to start another process.

There are several different flavors of exec_(). They all boil down to the same call within the kernel. One parameterization may be more or less convenient from time-to-time.

An exec'd process isn't completely different from the calling process. It does inherit some things, PPID, GID, and signal mask, but not signal handlers. Please see the man page for the details.

The exec_() functions do not return (a new process is now in charge). At least it is fair to say tat if they do return, something bad has happened. The previous example code also illustrates execvp().


Operating systems know a lot of things. They maintain and constantly maintain information about not only their own state, but also the processes and resources that they manage.

In days gone by, operating systems provided access to this information only through proprietary, non-standard interfaces. And, most of these interfaces were painful in the sense that each type of information would need to be queried separately, via a separate syscall. Porting software that required this type of access to the OS internals was a painful process -- not only did the details of the interface vary from system-to-system, but the paradigm sometimes changed.

Enter the /proc virtual file system. The /proc virtual file system isn't a real file system. But, it provides a way of exposing this type of information in a uniform way.

Basically, the kernel uses the file system interface as an interface to this internally maintained information. A user traverses the file system, lands at a file, directory, or link, and the info is theirs. There aren't real files. Instead, the OS just packages the information this way, generating it dynamically on demand, and representing it using the familiar file system interface.

Sometimes the information is contained within a file. Sometimes it is represented as a link to a file or directory (this directory is the processes current directory), and sometimes it is revealed through the file name. From UNIX-to-UNIX, the organization of /proc can be different -- but at least the paradigm is the same.

We took a quick look at Linux's /proc on the unix.andrew systems. In particular, we observed the directories representing processes were numbered. We also observed that file protections (ownership, group, bits) are used to prevent changing things that shouldn't be changed or by those who shouldn't be changing them.

We noted that in Linux, some of the system's configuration can be changed via /proc. These "kernel tunables" are much easier to change via /proc, by writing to a single file, than by building the kernel.

The First Lab great fun and is yours for the doing!

In class, we also took a quick look at opendir(), readdir(), and getopt(), which might be of help to you in hacking it out.