Return to the lecture notes index
April 21 and 23, 2007 (Lecture 31)
One of the most important problems in a distributed system is that of
naming. In a large distrbuted system, objects need unique
identifiers, e.g. names. The names need to be unique, yet because of
scale, can't necessarily be assigned by a signle authority. And, these
names need to be well-known, or at least readily knowable. Without these
properties, a distributed system is a world of disconnected islands, not
a functioning community.
One of the most successful distributed systems of all time is the
distributed system that manages names for the Internet, the
Domain Name System (DNS). It is also an excellent example of a
distributed Directory Service.
A directory service is nothing more than it sounds to be -- a service
that allows one, given a key, to find an entity. More conventional
directory services include the White Pages and the
Yellow Pages. And that little black book...
The DNS system is a directory service that distributes the names of
all of the hosts in the Internet across the entire Internet, and allows
any host to perform a lookup of any other hosts IP address by name
And, I can't begin to tell you how much of an improvement this is over
the old system -- which, believe it or not, was to register new systems
with a central authority that added them to a long plain-text list. And
then, periodically, downloading a copy of this list and updating the
copy on your system. Ouch!
Names vs. IP Addresses
- Computers work best with numbers: 184.108.40.206
- Computers categorize systems (interfaces)by network number, subnet
number, host number, &c
- People like more familiar names: gigo.sp.cs.cmu.edu
- People categorize them by more things that are more meaningful to us:
- gigo is a special purpose (sp) computer science (cs) system at
Carnegie Mellon University (cmu), an educational (edu) organization.
- DNS is a distributed database that contains mappings between
hostnames and IP addresses. (It also identifies the mail server for
- The information contained within DNS is spread out across the
Internet and stored with Domain Name Servers.
- Hosts query DNS through an application-level program known as the
- The resolver contacts the name servers and returns the result to the
- Name servers are generally running the Berkley Internet Name Domain
(BIND) software, but other compatible name servers do exist.
- Mappings can be cached along the way.
The Name Space
- The name space is hierarchical.
- The root is unnamed. Below this root are the top-level domains.
- Below the root are several domains categorized by the type of
organization. These domains are known as generic domains or
organizational domains: .com, .net, .edu, .org, .biz, .pro, .tv, .
- There are also domains categorized by country. These are known as
the country domains, a.k.a, geographical domains. For example, .us,
.nz, .ca., .uk
Top Level Domains
- The Network Information Center (NIC) is responsible for the top-level
- NIC maintains the top-level domains, but delegates the authority for
the other domains to their owners.
- These separately administered domains are known as zones.
- Zones can be divided into zones, e.g. cs.cmu.edu
Primary and Secondary Name Servers
- Each zone designates a primary name server and zero or more
secondary name servers.
- The system administrator keeps the primary name server up-to-date.
- The secondary server(s) periodically, typically every 3 hours,
query the primary name server and update their information. This
process is known as a zone transfer.
- In the event that the primary name server should fail, the secondary
name server can satisfy requests.
Root Name Servers
- There are thirteen root name servers.
- They are named A.ROOT-SERVERS.NET through M.ROOT-SERVERS.NET
- They maintain a list of all of the second-level name servers, e.g.,
the name server for .cmu.edu. (Note: First level would be .edu).
- Each name server must know how to contact each of these root name
- Queries to root name servers are iterative or non-recursive (This is
set with a flag)
- They do not return the IP address. Instead, they return address of
the authoritative name servers for that zone
- The resolver can then contact one of these servers.
- Queries to non-root name servers can be recursive.
- In other words, we can ask them to look up the name for us, if they
are not authoritative.
- In response to a recursive query, the queried name server will
contact the other name server itself and ask for the response.
- That server will in turn do the same.
- DNS servers employ a cache.
- They cache not only positive responses, such as mappings and "look
- They also cache failures, e.g., "unknowns". This is called negative
- The authoritative server contains a Time To Live (TTL) value in
seconds for each entry, and a default. No server can cache
information beyond its TTL.
- When an organization becomes authoritative for a domain, they get not
only their namespace, but a portion of the in-addr.arpa name space.
- This name space is used for IP address to name mappings.
- It is organized by the reverse of the IP address's dotted decimal
- For example, GIGO's in-addr.arpa name is 251.209.2.128.in-addr.arpa.
- By reversing the bytes of the IP address, the reverse query becomes
possible without an exhaustive search.
- An IP-->address query is known as a pointer query.
- A(Address): Defines the IP address of a host
- gigo.sp.cs.cmu.edu IN A 220.127.116.11
- CNAME (cannonical name): associates an alias with the canonical (primary) name of the owner
- gigo.sp.cs.cmu.edu IN CNAME ftp.gigo.sp.cs.cmu.edu
- HINFO (Host info): Specifies information about a particular host, such as CPU type and OS version.
- gigo.sp.cs.cmu.edu IN HINFO RH6.0/i386
- MX (mail exchange): Specifies the server that handles mail for a host
- gigo.sp.cs.cmu.edu IN MX 0 ux8.sp.cs.cmu.edu
- gigo.sp.cs.cmu.edu IN MX 10 smtp.andrew.cmu.edu
- PTR (pointer) provides the reverse mapping for pointer queries
- 251.209.2.128.in-addr.arpa IN PTR gigo.sp.cs.cmu.edu
- Plenty more
X.500 and LDAP
DNS is an effective directory service -- but it only solves one very
small slice of the pie. It handles DNS queries, and nothing else.
It holds DNS information and (almost) nothing else. X.500 is a
directory service designed to solve the more general problem.
It is a standard in the sense that it is defined by IT and OSI. Its
specification reads more-or-less like a network protocol, and leaves
the implementation to the implementor. Instead, only the interfaces
and behaviors are designed.
My point into discussing it isn't to go into a detailed discussion of
yet another "standard by abbreviation-enabled committee". Instead
it is just to observe its similarity in design to DNS and to reinforce
the idea that DNS is, in my estimation, the most successful distributed
system, ever -- and a great system to consider a model.
The collection of information stored in the X.500 space is known as the
Directory Information Base (DIB). This information is organized
in the form of a distributed tree known as the Distributed
Information Tree (DIT). The nodes of this tree are the X.500 servers
run by various organizations. These servers are known as Directory
Service Agents (DSAs). Clients are, no surprise, known as
Directory User Agents (DUAs).
The DIT, which is composed of the DSAs, is organized much as is DNS.
It is distributed among the hosts, and has an unnamed root. Much
as DNS uses dot-separated hierarchical names, X.500 uses names, using
a notation similar to a URL or directory path. The name is the path
from the root to the node. This graph is more-or-less a hierarchical
tree, whcih begins with the unnamed root, then moves to the country,
then the organization, then the division, and so on.
Each record consists of a collection attributes ad values. The type of
each attribute must be specified, and must be one of many defined by
a standard. A search is completed by searching the appropriate node of the
tree -- the node named by the full path.
The Light-weight Directory Access protocol (LDAP) is an interface
to X.500 which uses directly, and by definition, relies on TCP/IP. And,
as indicated by the name, is "light-weight" -- it eliminates much of
the bulk that resulted from satisfying "the committee".
Although LDAP was designed to provide a nice interface to X.500, it
can technically be used with any database that provides the minimal
functionality that it needs and that has a compatible interface.
These days, LDAP is probably best known for University
faculty/staff/student directory services and OS login databases.