Lecture 2 (Hardware Support)

Lecture 2 (Hardware Support)

Hardware Overview

I'm sure that many of you have discussed the logical organization of a computer's hardware in a previous class, but I'd like to do a quick overview today -- just to make sure that we're all on the same page.
Components
We can think of a computer as a collection of components connected via a collection of wires known as the bus. You can think of the bus as a perfect network -- no errors, no congestion, no failure. The components on this bus might include the following:

CPU -- One or more of these processors can exist in the same computer. They are ultimately responsible for executing programs. Traditionally CPUs perform arithmetic and logical operations. But some CPUs can perform very complex operations. In any case, it is ultimately the operations performed by the processor that control I/O devices, and other system functions.

Volatile Random Access Memory (RAM) -- addressable storage that loses its contents when power is lost.

Memory Controller -- The logic that acts as an interface between the memory and the CPU/devices accessing it. Among other things, the memory controller is necessary to ensure that individual memory accesses remain atomic in the presence of multiple users (multiple CPUs, DMA-based I/O, &c).

I/O Controllers -- The interface between the CPU/devices and a device. In addition to providing the electronic and logical interface to I/O devices, I/O controllers often perform tasks like buffering, queing, and scheduling. Disk controllers are a common and important type of I/O controller. Others might include the UARTs that drive serial ports, &c.

I/O devices -- devices like keyboard, disk drives, scanners, mice, &c

A Timer -- we'll discuss this special device shortly.

Memory Hierarchy

The next thing I'd like you to remember (if you've seen it before, don't worry about it otherwise) is the storage hierarchy. Computers contain several different types of memory. Faster memory is often more expensive, so less of it is available. Slower memory is often more plentifully available. It is the goal of the system to use faster memory as often as possible. To achieve this goal, the system tries to keep the items that will be used sooner in faster memory. Often times this involves using some policy to estimate what is likely to be used next -- one common policy is to kee the most recently used items in the fastest memory, assuming that they are most likely to be used again soon.
There are several different types of memory in a system. As we move down in this list, we are moving to slower memories that are typically more plentifully available because of a lower cost per unit:

registers -- very small units of memory built into the CPU itself that operate at the same speed as the CPU
L1 cache -- memory that is slightly slower than the CPU that is typically separate from the CPU but part of the same package.
L2 cache -- memory that is slower than the L1 cache, but faster than main memory. It is usually not part of the CPU's package, but it can be.
Main memory -- The slowest RAM in the system -- but much faster than the next level.
disk -- once used only for "external" or "offline" storage, the existence of demand paging and demand segmentation has turned disk into a massive, but slow secondary RAM. It is also often used for non-volatile storage in portable devices.

Typically, the program manages the registers through instructions that direct the CPU when to load and store values into/out of RAM. The caches are usually managed by hardware that uses a policy such as LRU with write-back or write-through to decide when to read or write values into main memory. Caches are invisible from the software side of things, except for the impact of misses on performance -- and the possible need to flush them upon context-switch. The operating system is generally responsible for the movement of information between main memory and disk -- consequently, it is this part of the hierarchy that we will study in the greatest depth this semester.

A Special I/O device: The Timer

Consider this: When the CPU is executing a particular program, the PC is moving through that program from one instruction to the next, unless the program, through a branch, instructs it to do something else. In a time sharing environment, for example, the OS might want to regularly switch among tasks in order to appear responsive to all users. But if one program is running, how is the OS to get control?
One approach to this is for all programs to regularly invoke the operating system by making a system call. In this way, programs could periodically yield to the OS, allowing the OS to dispatch another program. Windows 3.x used this approach caled non-preemptive multitasking. Programmers would periodically include the yield() call in their code. This would invoke the OS's scheduler, which might schedule another task to run.
But this approach was far from perfect. One problem involves a program that "runs away" executing some loop endlessly -- the OS can never get control to let anything else run. The other problem is that the more frequently a program yielded to the OS, the less cycles it might get. In the competitive environment in which software is written, some programmers might be tempted to yield less frequently (of course, this could be punished by the scheduler when they finally did yield).
Do you remember before when mentioned the timer device? I told you it was special and that we'd talk about it later. Modern computers include hardware timers, becuase they give the OS another way of getting control of the CPU. The timer device is basically a count-back timer, much like one would you might use to measure a cooking time (like an hour-glass egg time), or to enforce a time limit for a sporting event. The timer has a register that is initiallized, usually at boot time, to a particular value. It counts back from this value to 0 at which time it interrupts the CPU. The interval between timer interrupts is known as a time quantum. A typical quantum might be about 10mS.
Using this approach, a particular job is left to run for up to one quantum, at whcih time the timer interrupt goes off and the OS's scheduler is invoked via the timer's ISR. At this time the same job, or another job might be dispatched to the CPU.
It is important to note that jobs don't always execute for an entire quantum without interruption. Sometimes they perform I/O or do something else that temporarily prevents them from making use of the CPU. Often times libraries are written such that the call yields immediately after dispatching such a request, often called a blocking operation. We'll talk about this more soon.

Context Switching

In most cases, processes aren't aware that they are sharing the CPU and other system resources with other processes. Instead, much like the theater, the set has to be torn down and reassembled in between each scene. This takes much work and wastes many cycles. Consider the fact that each process has its own understanding of memory and has different values stored in registers. Consider the affect that switching tasks has on caches. We'll talk more about all of the accounting later, but for now, trust me -- it is expensive.
So why would the OS alternate among processes instead of letting one run until completion? One answer might be time sharing to allow several processes to interact with users, without forcing the users to conform to the computer's schedule (the alternative was getting in line to use the computer -- a common scene in yesteryear). Another reason might be to make use of CPU cycles that would be wasted by a process waiting for an device to complete a request.

Polling vs. Interrupts

Generally speaking, devices are unable to instantaneously respond to instructions. Instead, devices are given instructions, act on them over time, and then respond when they are done. Modern operating systems often try to perform other jobs while one job is blocked waiting for I/O. In order for this scheme to work, it is necessary for the devices to have some way of letting the system know that they need attention.
One way of handling this situation is for the software to periodically poll the device and ask "are you done, yet?" This approach does work, but unfortunately many cycles are wasted as the software naggs the hardware.
Another approach, which is commonplace in modern systems, requires hardware support. This system makes use of hardware interrupts. Imagine a system by which each device has a wire attached directly to the CPU called an Interrupt Request Line (IRQ). When a device wants attention, it can apply a voltage to this line. This voltage is then sensed by the CPU, allowing it to take appropriate action.
But wait! The CPU just follows instructions. The Program Counter (PC) just moves from one instruction to the next instruction, unless directed otherwise via some type fo branch, right? Well, generally speaking, yes. But in this case, the CPU and the devices are doing some under-the-table dealing via something called the Interrupt Vector Table. The interrupt vector table is an array of function pointers located at a predetermined address in memory. This address may be hard wired, stored in a register initialized at boot, or perhaps the entire table is stored in registers within the CPU, itself).
At boot time, the addresses for special functions known as Interrupt Service Routines (ISRs) or handlers are stored into this array. Each function is the piece of the operating system responsible for taking the right action in response to a device's request for attention.
Each of the wires that connects the device to the CPU has a number. When the interrupt occurs, the CPU knows which IRQ line is high. It is the number of this interrupt that is used as the index into the interrupt vector. basically, when an interrupt goes off, the interrupt number is used as an index into the interrupt vector. The address in the associated element is dereferenced, executing the appropriate service routine. When the routine is done, it restores the CPU to its previous state and the CPU picks up where it left off.

Well, I have to confess that the world is actually a little bit more complicated that I've alleged so far. There's actually another piece of logic involved called the interrupt arbitrator or interrupt controller. What does it do? Well, it is basically the broker that comes between the individual devices and the CPU. The CPU is (arguably) the most valuable, most contended rsource in the system -- and like most valuable resources, it has an administrative assistant to control interuptions. The interrupt arbitrator is exactly that administrative assistant. Much like the boss's secretary comes between the underlings and the boss, the arbitrator comes between the devices' interupt requests and the CPU.
Each device's IRQ lines are connected to the aribitrator which is connected via one interrupt line to the CPU. When a device requests attention, the interrupt is received by the interrupt arbitrator. If it is okay, the arbitrator interrupts the CPU, which in turn executes the ISR as described above.
Well how does the arbitrator know whether or not it is okay to disturb the boss? Well, like a good administrative assistant, it carefully follow's the boss's instructions. In order to decide whether or not a particular device should get the CPUs attention, it is helpful to know the relative importance of the devices. For this reason the number associated with each device's IRQ line is often used as a priority -- the lower the number, the more important the device. The arbitrator has a register set by the CPU that stores the interrupt level. The arbitrator will only interupt the CPU if the device's priority is equal to or greater than the value stored in this register. In other words if this register holds the value 5, we say that the interrupt level is 5. If this is the case, the CPU will only be interrupted by interrupts 0-5.
Well, what about the other devices? What if they need attention, but the CPU is doing something more important? Well, there is another register that contains one bit for each interrupt. Each time an interrupt occurs, the bit for this interrupt is set. This bit is cleared by the CPU when it services the interrupt. This register allows the CPU to discover what interrupts it may have missed. It is important to note that there is only one bit of information available about each interrupt -- not a queue. If the same interrupt occurs more than once, this is not known to the CPU. The only thing that it can discover is that the interrupt occured at least once.
There is one more register typically found in an interrupt arbitrator, the interrupt mask. This register holds one bit for each interrupt. The interrupt mask is ANDed with the the interrupt register discussed above. If the bit is not set in the interrupt mask, the CPU will not see that interrupt. The interrupt mask allows the CPU to temporarily (or permanantly) ignore certain devices, independent of their priority.
In this way, the CPU can interact with the devices in an orderly way.

Protection

At this point let me point out that with several different users sharing the same resources, the operating system needs to act a bit like a police officer and keep the order. It would be an interesting world if students could read my gradebook on the andrew system and see each others grades -- or if faculty and staff could see each other's salaries, &c. But the OS is a piece of software, like any other, so how can it do this?
The answer, as was the case for preemptive scheduling, is that the OS needs some level of support from the hardware. Hardware can enforce limits on what memory addresses a particular program can access, &c. The OS can change these values by using privleged instructions. These instructions may be incorporated into ISRs, or they may be invoked via special instructions called traps. In either case, the hardware verifies that the user is in fact the OS (or other privleged process) before allowing the instruction. If this isn't the case, an exception occurs. Exceptions are often handled in a way similar to interrupts, via a service routine.
As you'll see when you write your kernel, a trap is the entry point for all syscalls. The trap causes the hardware to enter system mode. At this point, the hander can verify that it is okay to perform whatever the privileged operatation might be. After verifying everything, the kernel will then perform the privilidged operation and return to user mode. Returning to user mode basically involves setting a register. The trap is only required to go the other way -- into system mode. The reason is that entering system mode needs to be done in a very controlled way, through the syscall interface, to ensure that the syscall is legitimate and correct.
Make a note now. We'll have a lot of fun with this part of your project #3. We'll try all sorts of evil things to make sure you check for them before entering the meat of a syscall -- some of them will be designed to emulate programmer errors, while others will be designed to emulate malicious acts.