Return to lecture notes index
October 30, 2006 (Lecture 11)

Lab #5

Lab #5 was released today. It is really a one-week lab. But, you want to read the handout before Thursday.

File I/O

We spent some time today reviewing basic C file IO, including fopen(), fclose(), fread(), fwrite(), fseek() and open(), close(), read(), write(), and lseek().

For this project, you're going to want to use open(), close(), read(), write(), lseek(). If you aren't familiar with them, check out the man pages and write some example code.

Sparse files

It is worth remembering that file storage is not allocated on a per-byte basis, but is instead allocated on a per-block basis.

Think back to our discussion of inodes and the block allocation tree. When a file needs storage, a whole block is allocated. the data that spills over into a new block is known as the "tail" and the space left over in the block is known as the "slack".

Well, what happens if wee seek out really far into a file and write exactly one byte? Does it allocated every blcok inbetween? Or only the last one? Or is it an error?

The answer is that blocks are only allocated when they are actually needed. So, only one block is allocated. The other blocks, in between, are known as a hole. When one writes into a hole, a new block is allocated to satisfy the write. When one reads into a hole, the read succeeds -- and returns 0.

Historically speaking, this was very importnat for core dumps since VM isn't dense -- there is a big "hole" between the stack and the heap.

For our purposes, it means that a hash table can be very large -- and not consume disk space so long as it is relatively sparse. To see the difference compare "ls -l", which shows the logical size of the file against "du" which shows the allocated size.