Return to lecture notes index
November 6, 2012 (Lecture 20)

Introduction

This lecture continues the group of lectures that we began last class -- backed away from the details of the lower level network protocols, we examine how networks are used within specific applications. By the schedule, today's lecture is about Voice over IP (VoIP). And, we'll talk about that. But, more importantly, we'll examine the properties of networks as they are percieved by users.

Network Performance Is About Latency, Latency, and Latency.

...nothing else matters.

In classes about networks, we're fond of differentiating between throughput and latency. We explain that throughput is the amount of work performed per unit time, such as the number of bits transfered by a network per second, and latency is how long it takes for something to happen. We explain, for example, that the start to finish latency is the sum of the time it takes for a message to propogate from a client to a server, the request to be processed and the server's reply to reach the client.

But, in this conversation we lose the fact that, in the end, it is only latency, not throughput, that matters to the end user. An arbitrarily large amount of data can be moved through a computer network, regardless of its data rate -- given sufficient time.

The same is true if we consider robustness. If we don't care about latency, we can tolerate a network with a very low availability or a very high error rate -- as we can just wait and send, or resend, later. (Note: We cannot tolerate arbitrarily low availability or arbitrarily high error rates, because the frame size imposes a lower bound, and frames cannot be arbitrarily small -- for example, consider the viability of a one bit frame size.)

Instead, any time we increase a network's capacity, or improve its reliability, we are making only one use-measurable improvement -- reducing its latency.

Sources of Latency

In thinking about latency, it is important to consider what causes latency, in other words, where we spend time. We spend time preparing messages for transmission over a network. This might involve gathering data fields, encoding them for portability, adding metadata, compressing the message, adding networking headers, etc. I call this Normalization and Representation Latency. We spend time clocking the bits of a message onto the network, which I call serialization latency. We spend time waiting for these bits to physically move through the media, likely at the speed light/emag/rf travels through the media, which is known as propagation latency. We spend time somewhat, but not exactly, mirroring what we did before: deserializing the message as it is clocked off the media back into memory, processing the message, etc. And, along the way, as it passes through routers, etc, it might well be partially or completely serialized, deserialized, and processed -- and maybe even wait around in queues, resultign in queuing latency.

Reducing Latency

If we want to reduce latency, we need to attack one of these causes. For example, if we want to reduce propagation latency, we can get a gain by moving from a slower media, like copper wire, to a faster media like optical fiber (be it a realtively small gain) -- or, we can reduce the length of the media, for example by replacing a long-haul network connection for a server with a short, local connection and co-location. We can get faster processors, to reduce processing delay, maybe reducing the need for queing. We can upgrade our network to one with a faster data rate to reduce serialization latency -- or, perhaps, use a denser representation or compress messages, to reduce the amount of data that needs to be serialized or deserialized (But, be careful...we don't want to add more processing latency than we save in serialization...).

You get the idea: We reduce latency by attacking one of its causes and, increasing the data rate, is just one common way of doing that. It decreases serialization latency, and, increases network time, which may prevent queuing latency as messages wait to be sent. In so doing, it attacks two causes of latency: serialization and queuing.

Jitter

Jitter is an important measure of network performance, most especially for real-time streaming media, e.g. voice, video, music, etc. Qualitatively, jitter is a measure of the smoothness of a stream of data. A stream with low jitter is smooth and constant. A stream with high jitter is jerky and haulting at times and speedy at others.

In short, the lower the jitter, the more constant the data rate. The higher the jitter, the more variable the data rate. And, for example, when it comes to real-time media streaming, the smoother the data, the smoother the audio or video stream, etc.

There is no one standard formula for latency. Some calculations are very straight-forward, others use complicated and subtle models, for example, to weight changes over time, reject outliers, etc. One simple way of calculating latency for a sequence of packets is to measure the latency for each packet, find the pairwise differences in latency, and average these. But, again, there are many other ways.

If the latency is constant, no matter how high or how low, jitter is zero. If the latency of one packet wildly varies from another packet, the latency is high.

Cause of of Jitter, and Jitter Reduction/Mitigation

Ultimate, what is the root cause of jitter? Well, latency, of course. Remember, latency is all that really matters (.) If a network has no latency, jitter is necessarily 0 -- each and every packet has a latency of 0, with no variation.

As the maximum latency increases, so does the potential (note the word, potential) for jitter. If each and every packet is delay by the same large amount, jitter is still 0. But, if some packets are more latent than others, we now have jitter. The maximum amount of jitter we can have is limited by the fastest possible delivery and the slowest possible delivery.

Because of this, the most natural way of reducing jitter is, perhaps, decreasing latency. Less latency, less jitter (.) So, for example, if we increase the available network time by increasing the data rate, we'll reducing queuing latency. If more packets move through without queuing, the variability in latency introduced by the occasional and variable queing goes way -- and with it, jitter is necessarily reduced.

Another technique for reducing jitter should call to mind the old saying, common in distributed systems, "It is easier to move a problem than to fix it." Jitter can be reduced simply by throwing away the outliers -- after waiting a certain amount of time, move on and never look back. In this case, we accept whatever penalty is to be suffered for lacking the data, instead of the penalty to be had by waiting for it: Problem....moved!

Another technique for reducing jitter is to use buffering to delay messages by some minimum amount of time. In other words, we add latency to reduce jitter. This gives us smooth consistency at the cost of a delayed stream. This might be better, depending upon the application. But, buffering is finite, so the solution has limits. And, again, at best: Problem....moved!

The last techique for reducing the impact of jitter is to mitigate it using redundancy. If we send redundant data, and some of it is latent, we can ignore it -- using the most timely copy available. This is especially useful if the redundancy includes both redundant data and redundant paths. This reduces jitter both by using redundancy to allow outlier rejection without data loss, and by reducing latency by giving us the least latent path at any time, thereby reducing jitter. (Note this improves fault tolerance, too.)

Of course, reducing jitter by increasing redundancy is not free. Sending more copies of data consumes more network time. These means that we have less network time available. So, we can either do less with the network, or do the same at a lower quality, etc.

For example, in the end, the trade-off might be: accept a less smooth playback due to jitter ... or ... accept a lower video resolution, a lower frame rate, send fewer streams ... or ... buy more network time. (Alternative: Invent better compression, a more dense encoding, etc).

Attributes and Applications

Consider moving a truly large number of truly huge files over a relatively short distance. Which types of latencies do you care about? Do you care about jitter?

For a bulk data application, such as this, you really only care about average throughput over the long haul. We don't care about the propagation speed of the media -- the distance is short. We don't care about queuing latency or other variable latency sources, because we don't care about jitter. We care, simply about throughput -- bits per second.

What about streaming relatively short audio or video? Well, because we can buffer the stream before playing it, the same is pretty much true. As long as we can reasonably delay enough at the beginning via buffering to slow down the fastest packets to match the slowest packets, we won't suffer any jerkiness. Of course, some reasonableness applies here -- a user will likely wait seconds, but not minutes or years, to buffer a stream.

What about real-time, interactive video? For example, a phone call? A video phone call? Or teleconferencing? We can't buffer here, because it will break the interaction. People are sensitive to more than 1/10th of second of delay or so. And, we can get that -- or even more over a long distance -- in propagation latency, and other mandatory latencies, alone. In these cases, we'll be forced to compromise. If the jitter is too high, resulting in too much jerkiness, we'll have to give up something, somewhere else, to try to fix it. For example, we can lower the quality to decrease how much we're sending, to decrease the load on the network, and try to decrease queing delay. Or, we can throw away slower packets, changing the artifacts from jerkiness, to choppiness. Or, we can consume network time with redundant data, so we can throw away slower copies -- compensating by reducing the size of what we send, for example, sending video with a lower resolution, frame rate, or color depth, or narrowing the frequencies of audio that we send or decreasing the depth of each audio sample.

Variable Quality Streaming

When sending streams, such as, for example, audio or video streams, we can choose our quality level, depending upon the data rate of the channel and the level of redundancy needed to mitigate jitter. When it comes to real-time streams with small audiences, such as phone calls or video teleconferencing, we can change the encoding in order to optimize the user experience based upon the observed sustainable data rates and latencies.

In some cases, we can't optimize in real time. For example, if you pay extra to buy an HDX streaming video from Hulu, you'll be really upset if significant chunks of it are in the lower quality SD. This is one of the reasons why they, instead, will, within reason, pause in the middle to rebuffer if needed -- and, if it gets crazy, offer you the option to downgrade to the lower quality version (even though you paid for more). It is also the reason they test your connection and, if necessary, warn you before you purchase a higher-quality option.

Another reason we may not be able to optimize in real time is that there are too many recipients with varied needs -- it is just computationally not worth the cost.

Just to introduce some vocabulary...the encoding of audio and video streams usually uses lossy compression that can be tuned to vary the tradeoff between compression ratio and size. These algorithms and their implementation are often patented. They are also implemented in hardware. The encoders are often paird with their decoders. Whether implemented in hardware or software, the coder-decoder pair is often known as a codec.

Skype

Skype is a very popular VoIP service. I'm not sure who else may have, in some way been in this space -- but, in reality, they were the first VoIP service that operated as an overlay network over the Internet. By overlay network, I mean a logical network overlayed, or operating over a physical network whcih does not necessarily have the same topology.

Skype is proprietary. Its technology is largely closed and not published. For example, the details of the authentication, codecs used, etc, have never been published and are not commonly known (although there has been some reverse engineering and some bits published).

But, the high-level architecture is unique because of the way its capacity grows as its user base grows. Participating hosts, known as super nodes act as the relays forming the overlay network. They enable the calls to happen among the users. Hosts behind firewalls, of course, cannot really act as servers in this capacity. So, they cannot function as super-nodes. They can't directly echange messages with each other, so they relay through super-nodes. These degenerate nodes that can't fully participate, and consume capacity without adding it, are known as nodes rather than super-nodes. In fact, all new participating hosts start out as nodes, and only after demonstrating their capability and stability to add to the capacity of the system, are they promoted to super-nodes (donating time, memory, processing, networking, etc, for the common good).

This system historically allowed Skype to grow dramatically with a relatively small investment in infrastructure. They had to scale up authentication, etc, but not the network. It did, however, lead to some management headaches, when, for example, software needed to be rapidly updated on equipment Skype neither owned nor directly managed. And, it, in effect, left the means of production out of Skype's hands.

As a result, Skype has somewhat recently begun maintaining their own super nodes. Although there are rumors were that this was to better enable "The man" to engage in wire-tapping, etc, etc, etc -- this is silly. It is reasonably clear, at least to me, that this was an effort to prevent some of the service problems that the "out of their control" model had previously left "out of their control." Such problems included the inability to rapidly update software and too few super-nodes at certain points in time.