September 16, 2008 2008 (Lecture 7)

September 16, 2008 (Lecture 7)

Networks and Concurrency

Thus far this semester, we've considered developing and using synchronization primitives to manage concurrent access to critical resources -- in situations where shared memory exists among the competing processess/threads. But, the techniques we've discussed don't help us absent shared memory. And, as soon as we're trying to coordinate activities over a network -- shared memory goes away.
Over the next few classes we'll discuss how computers coordinate to share common network channels. Then, we'll discuss how we can coodinate computation over a network as part of a distributed system.

The Nature of the Pipe

Individual stations of a network communicate over some shared channel. When, by design, there are only two stations associated with a particular channel, we call this a point-to-point network. When the channel is shared by more than two stations, it can go by different names, depending on the sharing discipline.
A point-to-point network may be either full duplex or half duplex. In a full duplex channel, communication can go in both directions. In a half duplex channel, communication can only go in one direction -- messages can not travel in the other. It is possible to assemble a full-duplex connection from two opposing half duplex connections.
Very little coordination is necessary in half-duplex connections. Either side can send whenever it wants. At best, flow control is needed so a sender doesn't send when a receiver is not ready. But, since the half-duplex channel is only available for one sender, it isn't really shared and coordination is unnecessary.
In a full duplex channel, unless composed of two completely independent half-duplex channels, like other shared channels, we'll need to define some sharing discipline so that the two sides don't garble each other's transmissions by sending at the same time. The rest of today's discussion will be dedicated to describing some of those techniques.

Static Time Division Multiplexing (TDM) - Time slicing

Perhaps the most straight-forward form of concurrency control is Time Division Multiplexing (TDM). TDM schemes give each of the stations exclusive use of the channel -- for a limited time per each. This can be accomplished, for example, throguh the use of a switch which moves round-robin from station to station in a star-configuration, giving each station its fixed-sized slice.
Static time-slicing is a nice solution in that it is easily understood and easy to implement in hardware -- it is also very robust as it is enforced by the switch rather than the stations, themselves. When all of the station exhibit farily uniform behavior, it is a very efficient solution.
It is interesting, though, that in many cases demand is not uniform, but is, instead, bursty. Bursty demand is characterized by brief periods of very high activity -- followed by long periods of silence. Static timeslicing is not particularly efficient for bursty loads, precisely because the network's capability is divided evenly among the stations -- even though most stations need none and a few could make use of more. You can imagine the switch offering time-slice after time-slice to stations that are uninterested -- only to keep the few presently active users waiting.
This brings us to an interesting characteristic of static time-slicing schemes. They are very efficient when demand is uniform and high -- but introduce significant latency and waste network time by treating all stations, even the disinterested, as active users.

Token-Ring Schemes

Token-ring is another time-division scheme. It can be more adaptable to demand than static time-slicing, but carries a higher overhead cost, and is more brittle in the event of host failure.
The basic idea is the same -- we divide up the network time such that each station gets a window of exclusive access. But, the mechanism for dividing up the time is different and cooperative among the stations, rather than centralized in hardware at the switch. The stations organize themselves into a ring -- either logical or physical. Then, get this, they pass around a token, a special message that conveys to the receiver the right to transmit.
The token moves around and around the ring, from station to station. If a station doesn't need the token, it is free to pass it along immediately upon receiving it. Similarly, the stations agree to give up the token after some maximum amount of time. Basically, the stations agree to a few things:

To form a logical ring (possibly wired into a physical ring)
Not to transmit until receiving the "token"
To stop transmitting upon "passing the token" to the next station
Not to hold the token for more than an established amount of time.

In some ways, a token-ring system is very desirable. In the event of low contention and bursty usage, it introduces less latency and wastes less netowrk time than static time-slicing, because an idle station can immediately pass the token, rather than maintaining its right to send for a fixed time slice. Yet, on the flip side, under high contention, the system acts as static timeslicing, because each station gets the network for the fixed maximum amount of time.
But, the overhead is higher than fixed time-slicing, because the token must be passed around and managed as a message. And, in the event that a station fails while holding the token, it can take a long time to recover. A station can only detect the failure if it waits longer than N (stations) * T (max time) for the token to come back around. And, even once recovering, it takes a full cycle to regenerate the token to ensure that mutliple tokens aren't generated.

Frequency Division Multiplexing (FDM) and Wave Division Multiplexing (WDM)

Another way of dividing up the network among stations is to allow each station to transmit any time it wants -- but only over a subset of the available band (frequency space). When the frequency space is limited in RF via electronics, we call it Frequency Division MUltiplexing (FDM) . When it is done via lenses and prisms, with colors of light, as is the case with fiber optics, we call it Wave Division Multiplexing. But, in either case, it is a dividing up of the frequency space among senders.
This approach is nice becuase it does not introduce the latency associated with time slicing -- a station is free to begin transmitting at any time. It is also very easily to implement in hardware -- which provides a natural and robust enforcement mechanism. But, when the number of stations is high, it might not be possible to carve each one out a piece of the frequency space, or the bandwidth and correspodning data rate would be too low.
And, just as was the case for the the TDM schemes, it is a static partitioning. So, it is very good for uniform usage -- but results in wasted network capacity under bursty conditions. Much as token-ring schemes introduced overhead associated with the token messages and time-slice schemes introduced overhead in the form of gaps to switch between time slots, FDM loses some bandwidth to safety-margins between the frequency spaces associated with each station.

Broadcasting -- A Dynamic Approach

If we consider bursty traffic, rather than a uniform load, we realize that, most of the time, the network is idle. Given a network with sufficient capacity for the load, most of the time will be idle time -- only punctuated by short bursts. Given this, it might make sense to allow stations to transmit whenever they want -- and just hope for the best.
And, in fact, this is a viable technique -- so long as we have a plan for managing the inevitable collision, when two or more stations transmit at the same time mangling each other's transmissions.
And, for a quick cut at this, it isn't a terribly hard problem to solve. Should a transmission become mangled, it will be rejected because its checksum will be bad, so the network software will discard it, before it ever becomes visible to the application software. Depending on the situation, the network protocol might cause it to be retransmitted, such as via a ack-based resend protocol for bulk data, or it might just make due without the transmission, such as might make sense for real-time variable-quality services, such as audio or video streams.
But, we have to be a little careful about how we resend. Imagine that two senders mangle each other's transmissions and immediately resend. They'll mangle each other's transmissions again. The same is true even if they wait some fixed time-out period. In order to escape this trap, they should wait some random amount of time before resending.
We'll talk more about the details momentarily. For the moment, let's revisit the requirements and performance characteristics one more time. For this to work, we need to have multiple stations sharing a broadcast media, such as the same wire or air transmission space. The system works really well when contention is low and traffic is bursty, becuase it allows a single station to capture the entire network capacity. But, it collapses quickly as contention rises, because collisions occur, garbing transmissions, requiring retransmissions, and making things worse. It offers little advantage, and significant risk, over static division techniques when access patterns are uniform, becuase static techniques can efficiently share the network if users have consistent needs and won't behave badly as usage increases. Broadcast techniques, even when well deisgned and implemented, tend to suffer badly when overal utilization is over about 30% -- even if it is bursty, because collision becomes too probably.

Carrier Sense Multiple Access (CSMA)

It makes little sense for one station to begin a transmission if another station is presently transmitting as a collision is almost guaranteed. The incidents of collision can be significantly reduced through what is known as carrier sense multiple access (CSMA). CSMA is a long phrase for something very simple -- listen before you leap. A station only begins to transmit if the "line is quiet". This, of course, does not guarantee that collision won't occur. More than one station could listen, hear nothing -- and transmit at the same time. But, it does reduce the liklihood.

Carrier Sense Multiple Access with Collision Detection (CSMA/CD)

Let's refine our understanding of the wire. The channel is not optimal -- it takes time for a signal to propogate. And, there is a limit as to how fast data can be placed into the channel. We call this limit the maximum bit-rate. Lastly, there is attenuation, or loss of strength, as a signal propogates.
The propogation delay is interesting, becausae it means that all stations are not hearing the same thing on the wire at the same time. It also means that we can have more than one transmission on the wire at the same time for a while before they collide. This means that, if left unconstrained, if a collision occurs, certain stations might see it, but not others.
This is especially true when considered in light of attenuation. It is possible that a sender at one end of a wire will clearly garble a signal for nearby senders, but that its signal will be too weak to seem like more than noise to distant stations.
It would be nice to ensure that we have one network that works the same for all stations, rather than a confusing mish-mash. In order to achieve this, we're going to impose a minimum transmission size and force stations to jam other transmissions upon detection fo a collision.
The bit-rate is interesting in setting the minimum transmission size, because of how it interplays with the propogation delay. If we consider the amount of time it takes for a signal to propogate from one sender to another, we can figure out how many bits are in transmission at any point in time. We refer to this as the bandwidth-delay product. For example, if the propogation delay is 10mS and the bit rate is 10,000,000 bits/second we find:
  10ms      1s       10,000,000 bits
       * -------- *  --------------  = 100,000 bits
         1,000 ms         s
  
The bandwidth-delay product tells us how long we need to continue to transmit in order to be able to detect a collision during transmission Imagine a very long wire with a very signficant delay. Now imagine a station at each end sending eactly one bit at the very same time. The transmissions will collide in the middle of the wire. Before the collision can be detected, it will have to propogate all the way to each end, just as the message would have, absent the collision. This takes time. So, by the time the energy from the colliding transmission makes it back to the other sender, it is done sending. It send its entire message -- and heard no collision.
So, in order for collision detection to work, there has to be a guarantee that a sender will still be sending at the time that a colliding transmission reaches the station. It is easy to see that, if two stations at opposite ends of the wire begin transmitting at exactly the same time, they'll need to transmit for the full propogration time in order to be able to detect a collision -- otherwise they would stop transmitting before the collision can be detected. This implies a minumum frame size equal to the bandwidth-delay product.
But, it actually gets worse than this. The worst case for a collision occurs not when the two stations start at the same time, but when they would only overlap in thier last bit. This implies that the minimum size must actually exceed twice the bandwidth-delay product.
Now, back to the attenuation issue. In order to ensure that transmissions are garbled, reagless of what is being sent or how powerful the signal, senders will sedn a jamming signal upon detecting a collision. Specifically, should a sender detect a collision until the end of the transmission slot, it will stop sending the actual message and switch over to a jamming signal that will whack the CRC for other receivers. Needless to say, it also prevents other senders from sending.
The last detail is that, upon detecting a collision via the jam signal, senders should stop transmitting -- and back off for a random amount of time to avoid a synchronized repeat performance. A common model is the "exponential random backoff". This uses an exponential random number distribution to govern the retry delay. The range of possible delays grows exponetially with each retransmission attempt. The idea is that one collision becomes a fluke -- but an increasing number of collisions indicate genuine contention which is best managed by spreading the transmissions out over a longer period of time. If the workload is truly low-contention and bursty, this will act to spread the bursts out into the larger quiet times.
Ethernet is a CSMA/CD protocol that uses a binary exponential random backoff to stagger retransmissions after collision detection.

Question for Thought

Satellites are very high bandwidth, very high latency communication channels that are exposed to thousands of kilometers of highly variable environmental noise.
Athough the antennas on the ground can all communicate with the satellite, they cannot necessarily hear each other. This, by itself, makes the use of a broadcast-based multicast scheme unsuitable for the uplink from the ground stations to the satellite. But, let's forget about reality for a moment.
Let's make the reality-contradicting, blatently-false, obviously-wrong assumption that the ground stations communicating with the satellite can all hear each other. Would it make sense to use a CSMA/CD scheme to manage contention for the uplink bandwidth?
Satellites can support bit rates in the gigabit/second range. Latency is typically about 270mS -- 1/4 second. Consider the bandwidth-delay product -- the frame size would have to be in the hundreds of megabytes-to-gigabytes per second range. This seems funky on its face, especially if smaller messages need to be sent. But, now consider the impact of noise periodically whacking at those huge messages. Keep in mind that the liklihood of a frame being garbled in transit is proportional to its size. This is just simple probability -- the bigger the target the more likely it is to be hit.
Clearly satellites don't use CSMA/CD. They use timedivision, and possibly also frequency division. Very few users could realistically make productive use of all of a satellite's capacity at once, anyway, even during a burst.

Challenge Question

Consider a collection of computers, for example on campus, forming an ad hoc wireless network. By an ad hoc network, I mean one that is self-organizing based on which laptops happen to be in radio range of each other, rather than one that is structured in advance, whether by human or algorithmic design.
Think about the laptops as radio transmitters that can talk to other laptops within some small radius. But, rememebr that because of hills, valleys, buildings, cars and trucks, &c the exact radius isn't perfect or predictable. Would it make sense for these laptops to communicate using CSMA/CD? What about just CSMA, without CD?
Nope, and nope. You could sense carrier -- but the value would be limited. Just because the Sender hears chatter doesn't imply that the intended Recipent also hears the same chatter -- the source could be closer to the Sender than the Receiver. Similarly, the value of listening is further reduced given that the Receiver might be disturbed by chatter that is too far away from the Sender to be noticed by the Sender.
As far as collision detection is concerned, the same is true. Maybe any noise matters to the receiver. Maybe it doesn't. Why stop? And, if one stopped at the sound of any noise, given that it isn't transitive, consider how much this limits the opportunity to send.