Return to the lecture notes index
April 4, 2008 (Lecture 28)


Before discussing distributed file systems, it makes sense to discuss file systems. And, it makes sense to begin with the concept of a file. What is a file?

Similarly, a file is an abstraction. It is a collection of data that is organized by its users. The data within the file isn't necessarily meaningful to the OS, the OS many not know how it is organized -- or even why some collection of bytes have been organized together as a file. None-the-less, it is the job of the operating system to provide a convenient way of locating each such collection of data and manipulating it, while protecting it from unintended or malicious damage by those who should not have access to it, and ensuring its privacy, as appropriate.

Introduction: File Systems

A file system is nothing more than the component of the operating system charged with managing files. It is responsible for interacting with the lower-level IO subsystem used to access the file data, as well as managing the files, themselves, and providing the API by which application programmers can manipulate the files.

Factors In Filesystem Design

  1. naming
  2. operations
  3. storage layout
  4. failure resiliance
  5. efficiency (lost space is not recovered when a process ends as it is with RAM, the penalty is also higher for frequent a factor of 106)
  6. sharing and concurrency
  7. protection


The simplest type of naming scheme is a flat space of objects. In this model, there are only two real issues: naming and aliasing.

Naming involves:


Aliasing is the ability to have more than one name for the same file. If aliasing is to be permitted, we must detemrine what types. It is useful for several reasons:

There are two basic types:

In order to implement hard links, we must have low level names.

UNIX has low-level names, they are called inodes. The pair (device number, inode # is unique). The inode also serves as the data structure that represents the file within the OS, keeping track of all of its metadata. In contrast, MS-DOS uniquely names files by their location on disk -- this scheme does not allow for hard links.

Hierarchical Naming

Real systems use hierarchical names, not flat names. The reason for this relates to scale. The human mind copes with large scale in a hierarchical fashion.It is essentially a human cognative limitation, we deal with large numbers of things by categorizing the. Every large human organization is hierarchical: army, companies, church, etc.

Furthermore, too many names are hard to remember and it can be hard to generate unique names.

With a hierarchical name space only a small fraction of the full namespace is visible at any level. Internal nodes are directories and leaf nodes are files. The pathname is a representation of the path from the leafnode to the root of the tree.

The process of translating a pathname is known as name resolution. We must translate the pathname one step at a time to allow for symbolic links.

Every process is associated with a current directory.  The low level name is evaluated by chdir().If we follow a symbolic link to a location and try to "cd ..", we won't follow the symbolic link back to our original location -- the system doesn't remember how we got there, it takes us to the parent directory.

The ".." relationship superimpsoes a Directed Acyclic Graph(DAG) onto the directory structure, which may contain cycles via links.

Have you ever seen duplicate listings for the same page in Web searche ngines? This is because it is impossible to impose a DAG onto Web space -- not only is it not a DAG on any level, it is very highly connected.

Each directory is created with two implicit components

Directory Entries

What exactly is inside of each directory entry aside form the file or directory name?

UNIX directory entries are simple: name and inode #. The inode contains all of the metadata about the file -- everything you see when you type "ls -l". It also contains the information about where (which sectors) on disk the fiel is stored.

MS-DOS directory entries are much more complex. They actually contain the meta-data about the file:

Unix keeps similar information in the inode. We'll discuss the inode in detail very soon.

File System Operations

File system operations generally fall into one of three categories:

From open() to the inode

Before going any farther, I'd like to review a few details from 15-213, where certain file system data structures, and the process of opening a file, are discussed. Just to make sure that we're on the same page. Then, I'll charge forward into storage allocation, which is new in 15-412.

The operating system maintains two data structures representing the state of open files: the per-process file descriptor table and the system-wide open file table.

When a process calls open(), a new entry is created in the open file table. A pointer to this entry is stored in the process's file descriptor table. The file descriptor table is a simple array of pointers into the open file table. We call the index into the file descriptor table a file descriptor. It is this file descriptor that is returned by open(). When a process accesses a file, it uses the file descriptor to index into the file descriptor table and locate the corresponding entry in the open file table.

The open file table contains several pieces of information about each file:

Each entry in the open file table maintains its own read/write pointer for three important reasons:

One important note: In modern operating systems, the "open file table" is usually a doubly linked list, not a static table. This ensures that it is typically a reasonable size while capable of accomodating workloads that use massive numbers of files.

Session Semantics

Consider the cost of many reads or writes may to one file.

The solution is to amortize the cost of this overhead over many operations by viewing operations on a file as within a session. open() creates a session and returns a handle and close() ends the session and destroys the state. The overhead can be paid once and shared by all operations.

Consequences of Fork()ing

In the absence of fork(), there is a one-to-one mapping from the file descriptor table to the open file table. But fork introduces several complications, since the parent task's file descriptor table is cloned. In other words, the child process inherits all of the parent's file descriptors -- but new entries are not created in the system-wide open file table.

One interesting consequence of this is that reads and writes in one process can affect another process. If the parent reads or writes, it will move the offset pointer in the open file table entry -- this will affect the parent and all children. The same is of course true of operations performed by the children.

What happens when the parent or child closes a shared file descriptor?

Why clone the file descriptors on fork()?