Go back to the main page.

Research

YouTube  Down for > 5 hours! Why?

In February 2008, one of the most popular website, YouTube, became unreachable from most of the Internet for more than 5 hours. What happened was that any packets sent towards YouTube were flowing to and dropped in Pakistan, despite its widespread ban on YouTube [news article][link]. This problem, called "Internet blackhole", has been known for a long time. Smaller-scale accidents of this nature have happened at regular intervals. In December 2004, even a larger-scale event effectively brought down tens of thoudsands of networks over 10 hours [link]. 

A common cause of this kind of problems is errors in network configurations. Configuring a network is an extremely complex, costly, and error-prone task, often compared to writing  distributed assembly programs. The focus of our research is to automate this process of network configuration through management modules that can be immediately deployed and used by network operators so as to have a practical impact on the manageability, reliability, and security of networks. 

This focus is well demonstrated by our misconfiguration detection systems, NetPiler, Prometheus and Minerals, which have been deployed in a large European provider network as well as three other production networks. These systems discovered a total of more than one hundred misconfigurations that were confirmed and corrected by the network operators. NetPiler, Prometheus and Minerals are published in JSAC 2009 [pdf], DSN PFARM 2009 [pdf], ToN 2009 [pdf], and MineNet 2006 [pdf].

In addition to the two misconfiguration detection systems, we have been working on other management modules, Reorganization, Visualization, and Classification, which are being tested with the networks and have received positive feedback from the network operators. Reorganization improves readability of network configurations by streamlining policies in the configurations. Visualization helps operators better understand their configurations by extracting high-level, intended policies from low-level configuration commands. Classification identifies critical network elements which need to be carefully maintained. These modules are partially introduced in TNSM 2010, DSN 2008  [pdf] and ICC 2008 [pdf].