April 21 and 23, 2007 (Lecture 31)

April 21 and 23, 2007 (Lecture 31)

Naming

One of the most important problems in a distributed system is that of naming. In a large distrbuted system, objects need unique identifiers, e.g. names. The names need to be unique, yet because of scale, can't necessarily be assigned by a signle authority. And, these names need to be well-known, or at least readily knowable. Without these properties, a distributed system is a world of disconnected islands, not a functioning community.
One of the most successful distributed systems of all time is the distributed system that manages names for the Internet, the Domain Name System (DNS). It is also an excellent example of a distributed Directory Service.
A directory service is nothing more than it sounds to be -- a service that allows one, given a key, to find an entity. More conventional directory services include the White Pages and the Yellow Pages. And that little black book...
The DNS system is a directory service that distributes the names of all of the hosts in the Internet across the entire Internet, and allows any host to perform a lookup of any other hosts IP address by name or vice-versa.
And, I can't begin to tell you how much of an improvement this is over the old system -- which, believe it or not, was to register new systems with a central authority that added them to a long plain-text list. And then, periodically, downloading a copy of this list and updating the copy on your system. Ouch!

Names vs. IP Addresses

Computers work best with numbers: 128.2.209.251
Computers categorize systems (interfaces)by network number, subnet number, host number, &c
People like more familiar names: gigo.sp.cs.cmu.edu
People categorize them by more things that are more meaningful to us:
gigo is a special purpose (sp) computer science (cs) system at Carnegie Mellon University (cmu), an educational (edu) organization.

The Pieces

DNS is a distributed database that contains mappings between hostnames and IP addresses. (It also identifies the mail server for each entry).
The information contained within DNS is spread out across the Internet and stored with Domain Name Servers.
Hosts query DNS through an application-level program known as the resolver.
The resolver contacts the name servers and returns the result to the application.
Name servers are generally running the Berkley Internet Name Domain (BIND) software, but other compatible name servers do exist.
Mappings can be cached along the way.

The Name Space

The name space is hierarchical.
The root is unnamed. Below this root are the top-level domains.
Below the root are several domains categorized by the type of organization. These domains are known as generic domains or organizational domains: .com, .net, .edu, .org, .biz, .pro, .tv, . web, &c.
There are also domains categorized by country. These are known as the country domains, a.k.a, geographical domains. For example, .us, .nz, .ca., .uk

Top Level Domains

The Network Information Center (NIC) is responsible for the top-level domains.
NIC maintains the top-level domains, but delegates the authority for the other domains to their owners.
These separately administered domains are known as zones.
Zones can be divided into zones, e.g. cs.cmu.edu

Primary and Secondary Name Servers

Each zone designates a primary name server and zero or more secondary name servers.
The system administrator keeps the primary name server up-to-date.
The secondary server(s) periodically, typically every 3 hours, query the primary name server and update their information. This process is known as a zone transfer.
In the event that the primary name server should fail, the secondary name server can satisfy requests.

Root Name Servers

There are thirteen root name servers.
They are named A.ROOT-SERVERS.NET through M.ROOT-SERVERS.NET
They maintain a list of all of the second-level name servers, e.g., the name server for .cmu.edu. (Note: First level would be .edu).
Each name server must know how to contact each of these root name servers.

Iterative Queries

Queries to root name servers are iterative or non-recursive (This is set with a flag)
They do not return the IP address. Instead, they return address of the authoritative name servers for that zone
The resolver can then contact one of these servers.

Recursive Queries

Queries to non-root name servers can be recursive.
In other words, we can ask them to look up the name for us, if they are not authoritative.
In response to a recursive query, the queried name server will contact the other name server itself and ask for the response.
That server will in turn do the same.

Caching

DNS servers employ a cache.
They cache not only positive responses, such as mappings and "look heres".
They also cache failures, e.g., "unknowns". This is called negative caching.
The authoritative server contains a Time To Live (TTL) value in seconds for each entry, and a default. No server can cache information beyond its TTL.

Pointer Queries

When an organization becomes authoritative for a domain, they get not only their namespace, but a portion of the in-addr.arpa name space.
This name space is used for IP address to name mappings.
It is organized by the reverse of the IP address's dotted decimal notation.
For example, GIGO's in-addr.arpa name is 251.209.2.128.in-addr.arpa.
By reversing the bytes of the IP address, the reverse query becomes possible without an exhaustive search.
An IP-->address query is known as a pointer query.

Resource Records

A(Address): Defines the IP address of a host
- gigo.sp.cs.cmu.edu IN A 128.2.209.251
CNAME (cannonical name): associates an alias with the canonical (primary) name of the owner
- gigo.sp.cs.cmu.edu IN CNAME ftp.gigo.sp.cs.cmu.edu
HINFO (Host info): Specifies information about a particular host, such as CPU type and OS version.
- gigo.sp.cs.cmu.edu IN HINFO RH6.0/i386
MX (mail exchange): Specifies the server that handles mail for a host
- gigo.sp.cs.cmu.edu IN MX 0 ux8.sp.cs.cmu.edu
- gigo.sp.cs.cmu.edu IN MX 10 smtp.andrew.cmu.edu
PTR (pointer) provides the reverse mapping for pointer queries
- 251.209.2.128.in-addr.arpa IN PTR gigo.sp.cs.cmu.edu
Plenty more

X.500 and LDAP

DNS is an effective directory service -- but it only solves one very small slice of the pie. It handles DNS queries, and nothing else. It holds DNS information and (almost) nothing else. X.500 is a directory service designed to solve the more general problem.
It is a standard in the sense that it is defined by IT and OSI. Its specification reads more-or-less like a network protocol, and leaves the implementation to the implementor. Instead, only the interfaces and behaviors are designed.
My point into discussing it isn't to go into a detailed discussion of yet another "standard by abbreviation-enabled committee". Instead it is just to observe its similarity in design to DNS and to reinforce the idea that DNS is, in my estimation, the most successful distributed system, ever -- and a great system to consider a model.
The collection of information stored in the X.500 space is known as the Directory Information Base (DIB). This information is organized in the form of a distributed tree known as the Distributed Information Tree (DIT). The nodes of this tree are the X.500 servers run by various organizations. These servers are known as Directory Service Agents (DSAs). Clients are, no surprise, known as Directory User Agents (DUAs).
The DIT, which is composed of the DSAs, is organized much as is DNS. It is distributed among the hosts, and has an unnamed root. Much as DNS uses dot-separated hierarchical names, X.500 uses names, using a notation similar to a URL or directory path. The name is the path from the root to the node. This graph is more-or-less a hierarchical tree, whcih begins with the unnamed root, then moves to the country, then the organization, then the division, and so on.
Each record consists of a collection attributes ad values. The type of each attribute must be specified, and must be one of many defined by a standard. A search is completed by searching the appropriate node of the tree -- the node named by the full path.
The Light-weight Directory Access protocol (LDAP) is an interface to X.500 which uses directly, and by definition, relies on TCP/IP. And, as indicated by the name, is "light-weight" -- it eliminates much of the bulk that resulted from satisfying "the committee".
Although LDAP was designed to provide a nice interface to X.500, it can technically be used with any database that provides the minimal functionality that it needs and that has a compatible interface.
These days, LDAP is probably best known for University faculty/staff/student directory services and OS login databases.