The Domain Name System (DNS)

The Domain Name System

The Domain Name System from the layman's perspective
An Overview of the DNS
Elements of the DNS
Name space specifications and terminology
Resource Records
Aliases and canonical names
Queries
Name Servers
Resolvers
How is "people.csa.iisc.ernet.in" resolved ?
How do you confiure the resolver in Linux ?
Tools to query name server in Linux
What port numbers are used for name server access ?
Further References

The Domain Name System from the layman's perspective

Every computer on the Internet has an IP address (which is similar to the telephone number) because the internet infrastrucutre only understands IP addresses. IP addresses are 32 bit numbers usually represented as four numbers separated by dots. The good thing is that when we access the web, we specify the domain name instead of the IP address. For example, the URL "http://www.google.com" contains the domain name www.google.com. Imagine how difficult it would be if we had to specify 216.239.41.99 (the IP address) instead of www.google.com (the domain name). Not satisfied yet, imagine having to remember IP addresses of each website you access.

The Domain Name System (DNS) is the system that allows us to use a familiar string of letters (the "domain name") instead of the arcane IP address. The DNS translates the domain name into the IP address behind our backs and makes life easy for us. It also makes sure that the email that you send to someone@hotmail.com reaches the correct IP address.

An Overview of the DNS

The DNS is a hierarchical structure, somwhat like an inverted tree. At the very top is the root, below which are the various Top Level Domains (TLD) including org, com, net, edu, info, biz etc and the country-level domains like in (India),fi (Finland), ca (Canada) etc . Below this are various second-level domains like yahoo, hotmail, ernet, google etc. There may be may be upto 127 levels. The levels are separated by a dot ".". The lower most level (the one farthest away from the root) is mention first. "people.csa.iisc.ernet.in" is an example of a 5 level domain name.

This tree structure is divided into "zones", each of which has its own "name server". Name servers are computers which remember the host name to IP address mappings of all the host in a zone. They also know the address of name servers for their "sub domains". So in essence, the name server for a particular zone has information that can lead you to any host that lies below it in the tree. If it does not have the IP address of that host, it can give you the address of a name server that may have it. At the heart of the DNS are the 13 root servers which are the name servers of the TLD. The servers are uniquely named A, B, C and so on up to M. Ten of these servers reside in the United States, one in Japan, one in London, and one in Stockholm, Sweden. Every name server knows the IP addresses of the root servers.

Every host computer has a program called a "resolver" that takes the domain name from the user and converts it into an IP address. The resolver has the IP address of a few DNS name servers which it can query. Let us say you wanted to find the IP address of www.google.com, you would ask your resolver which in turn would ask one of the known name server.

Do you know the IP address of www.google.com ?

Assuming that the name server does not know the answer, it would reply ?

I don't know the IP address of www.google.com, but you could ask the root server, here is its address.

The resolver would now ask the root server the same question.

Do you know the IP address of www.google.com ?

The root server would reply.

I don't know the IP address of www.google.com but you could ask name server for the com domain, here is its address.

The resolver would now query the name server for the com domain. The name server would reply.

I don't know the IP address of www.google.com but you could ask name server for the google domain, here is its address.

The resolver would now query the google name server which would reply

I know the IP address of www.google.com, it is 216.239.41.99.

The resolver would return this IP address to the user and would cache it in its local memory so that it doesn't have to ask these many questions if the user wants to know the same IP address again. We see that if one DNS server doesn't know how to translate a particular domain name, it asks another one, and so on, until the correct IP address is returned.

Elements of the DNS

The DNS has three major components:

The Domain Name Space and Resource Records, which are specifications for a tree structured name space and data associated with the names. Each node in the domain name space has some information associated with it, for example its IP address and the Mail Exchangers. This information is contained in Resource Records.
The Name Servers are server programs which hold information about the domain tree's structure. In general a particular name server has complete information about a subset of the domain space. a name server is said to be an Authority for parts of the name space, for which it has complete information.
The Resolvers are programs that extract information from name servers in response to client requests. Resolvers must be able to access at least one name server and use that name server's information to answer a query directly, or pursue the query using referrals to other name servers.

Name space specifications and terminology

The domain name space is a tree structure. Each node on the tree has a label (Node referes to both interior nodes and leaves). The domain name of a node is the list of the labels on the path from the node to the root of the tree. By convention, the labels that compose a domain name are printed or read left to right, from the most specific (lowest, farthest from the root) to the least specific (highest, closest to the root). When a user needs to type a domain name, the labels are separated by dots (".").

Domain name comparisons for all present domain functions are done in a case-insensitive manner. That is, two names with the same spelling but different case are to be treated as if identical.

We distinguish between absolute and relative domain names:

Absolute - a character string which represents a complete domain name. for example "opal.csa.iisc.ernet.in.". A point to note is that an absolute domain name contains a "." at the very end.
Relative- a character string that represents the starting label of a domain name which is incomplete, and should be completed by local software using knowledge of the local domain. For example, when we say opal in the Computing Lab at IISc, it automatically suffixes csa.iisc.ernet.in to make it an absolute domain name. Another example is that we can simply write google in the web browser and it will automatically go to www.google.com

Resource Records

A domain name identifies a node each of which has a set of resource information associated with it. The set of resource information associated with a particular name is composed of separate resource records (RRs)

Each Resource Record (RR) has the following:

Owner	This field defines what domain name applies to the given RR.
Type	A value that specifies the type of data/resource that is there in this resource record. The possible values and their meanings are:
	A	A host IP address
	CNAME	Identifies its owner name as an alias and gives its canonical name
	HINFO	identifies the CPU and OS used by a host
	MX	identifies a mail exchange for the domain (the machine to which mail for a domain name should be delivered).
	NS	The authoritative name server for the domain
	PTR	A pointer to another part of the domain name space
	SOA	The Start Of Authority record designates the start of a zone. The zone ends at the next SOA record.
Class	A value which identifies a protocol family. The possible values are:
	IN	the Internet system
	CH	the Chaos system
TTL	TTL stands for Time To Live. It specifies how long a domain resolver should cache the RR before it throws it out and asks a domain server again.
RDATA	The actual data stored in the resource record. The data field is defined differently for each type and class of data. The data stored for the various types is :
	A	For the IN class, a 32 bit IP address in dotted decimal form.
	CNAME	The canonical domain name corresponding to this alias.
	HINFO	The HINFO record gives information about a particular host. The data is two strings separated by whitespace. The first string is a hardware description and the second is software. The hardware is usually a manufacturer name followed by a dash and model designation. The software string is usually the name of the operating system.
	MX	a 16 bit preference value (lower is better) followed by a host name willing to act as a mail exchange for the owner domain.
	NS	A host name of a machine that provides domain service for this particular domain
	PTR	a domain name. They are mainly used for translation of IP addresses to names.
	SOA	contains several fields giving details about the zone. The fields are. MNAME - is the name of the zone. RNAME- is a mailbox for the person responsible for the zone. It is formatted like a mailing address but the at-sign that normally separates the user from the host name is replaced with a dot. SERIAL- is the version number of the zone file. It should be incremented anytime a change is made to data in the zone. REFRESH- is how long, in seconds, a secondary name server is to check with the primary name server to see if an update is needed. RETRY- is how long, in seconds, a secondary name server is to retry after a failure to check for a refresh. EXPIRE- is the upper limit, in seconds, that a secondary name server is to use the data before it expires for lack of getting a refresh. MINIMUM- is the minimum number of seconds to be used for TTL values in RRs.

Aliases and canonical names

In existing systems, hosts and other resources often have several names that identify the same resource. For example, the domain names "cl.csa.iisc.ernet.in", "people.csa.iisc.ernet.in" and "deimos.csa.iisc.ernet.in" refer to a single machine. Most of these systems have a notion that one of the equivalent set of names is the canonical or primary name and all others are aliases. For example, in the above case, "deimos.csa.iisc.ernet.in" is the canonical name while the other two are aliases.

The domain system provides such a feature using the canonical name (CNAME) RR. A CNAME RR identifies its owner name as an alias, and specifies the corresponding canonical name in the RDATA section of the RR. If a CNAME RR is present at a node, no other data should be present; this ensures that the data for a canonical name and its aliases cannot be different.

Queries

Queries are messages which may be sent to a name server to provoke a response. The user does not generate queries directly, but instead makes a request to a resolver which in turn sends one or more queries to name servers and deals with the error conditions and referrals that may result.

DNS queries and responses are carried in a standard message format. The message format has a header containing a number of fixed fields which are always present, and four sections which carry query parameters and RRs.

The four sections are:

Question	Carries the query name and other query parameters.
Answer	Carries RRs which directly answer the query.
Authority	Carries RRs which describe other authoritative servers. May optionally carry the SOA RR for the authoritative data in the answer section.
Additional	Carries RRs which may be helpful in using the RRs in the other sections.

Name Servers

Name servers are the repositories of information that make up the domain database. The database is divided up into sections called zones, which are distributed among the name servers. A given zone will be available from several name servers to insure its availability in spite of host or communication link failure. It is required that every zone to be available on at least two servers. The "csa.iisc.ernet.in" zone has 4 name servers "ece.iisc.ernet.in","drona.csa.iisc.ernet.in", "csa.iisc.ernet.in" and "iisc.ernet.in". Notice that "csa.iisc.ernet.in" is a zone and also a host name.

A given name server will typically support one or more zones. For example, "ece.iisc.ernet.in" is a name server for "iisc.ernet.in" as well as "csa.iisc.ernet.in".

The data that describes a zone has four major parts:

Authoritative data for all nodes within the zone: The authoritative data for a zone is simply all of the RRs attached to all of the nodes from the top node of the zone down to leaf nodes
Data that defines the top node of the zone: Data that defines the top node of the zone consists of RRs that list all of the name servers for the zone, and a single SOA RR that describes zone management parameters.
Data that describes delegated subzones: These are NS RRs that name the servers for the subzones. these RRs are NOT part of the authoritative data of the zone, and should be exactly the same as the corresponding RRs in the top node of the subzone.
Data that allows access to name servers for subzones (sometimes called "glue" data): One of the goals of the zone structure is that any zone have all the data required to set up communications with the name servers for any subzones. That is, parent zones have all the information needed to access servers for their children zones. The NS RRs that name the servers for subzones are often not enough for this task since they name the servers, but do not give their addresses. In particular, if the name of the name server is itself in the subzone, we could be faced with the situation where the NS RRs tell us that in order to learn a name server's address, we should contact the server using the address we wish to learn. To fix this problem, a zone contains "glue" RRs which are not part of the authoritative data, and are address RRs for the servers. These RRs are only necessary if the name server's name is "below" the cut, and are only used as part of a referral response.

When some organization wants to control its own domain, the first step is to identify the proper parent zone, and get the parent zone's owners to agree to the delegation of control.

Once the proper name for the new subzone is selected, the new owners are required to demonstrate redundant name server support.Note that there is no requirement that the servers for a zone reside in a host which has a name in that domain. In many cases, a zone will be more accessible to the internet at large if its servers are widely distributed rather than being within the physical facilities controlled by the same organization that manages the zone.

As the last installation step, the delegation NS RRs and glue RRs necessary to make the delegation effective should be added to the parent zone. The administrators of both zones should insure that the NS and glue RRs which mark both sides of the cut are consistent and remain so.

There are two modes in which a name server can operate, recursive and non-recursive. The way that the name server answers a query depends upon the mode in which it is operating.

In non-recursive mode, the server answers queries using only local information: the response contains an error, the answer, or a referral to some other server "closer" to the answer. The resolver may need to consult multiple name servers before it has the answer. is appropriate if the requester is capable of pursuing referrals.
In recursive mode, the name server acts in the role of a resolver and returns either an error or the answer, but never referrals. If the server need to consult another name server, it will do so. The use of recursive mode is limited to cases where both the client and the name server agree to its use. The agreement is negotiated through the use of the RD (recursion desired) bit in query and the RA (recursion available) bit in the response messages

Resolvers

A resolver is a program that extracts information from name servers in response to client requests. The resolver is located on the same machine as the program that requests the resolver's services, but it may need to consult name servers on other hosts.

A very important goal of the resolver is to eliminate network delay and name server load from most requests by answering them from its cache of prior results.

One option for implementing a resolver is to move the resolution function out of the local machine and into a name server which supports recursive queries.

How is "people.csa.iisc.ernet.in" resolved ?

Let us see the different name servers, starting from the root, that will be queried while resolving "people.csa.iisc.ernet.in" .

Root server: A.ROOT-SERVERS.NET , IP address - 198.41.0.4

this returns the name servers for "in" zone

"in" Name Server: naamak.ncst.ernet.in, IP address - 202.41.110.66

this returns the name servers for "iisc" zone because it is also the authoritative server for "ernet" zone.

"iisc" Name Server: ece.iisc.ernet.in, IP address -144.16.64.2

returns the name servers for csa zone

"csa" Name Server: csa.iisc.ernet.in, IP address - 144.16.67.8

returns the fact that people.csa.iisc.ernet.in is an alias for deimos and that the IP address for deimos is 144.16.67.57

How do you confiure the resolver in Linux ?

The resolver is a set of routines in the C library that provide access to the Internet Domain Name System.

One of the resolver configuration file in linux is "/etc/resolv.conf". The man page giving details about the format of the resolv.conf file can be seen by typing the following command

# man 5 resolver

The configuration file contains a list of name server that the resolver should query and the domains that will be searched if a short/relative domain name is specified. By default, the local domain name is used to complete relative domain names.

Sample resolv.conf files taken from hosts in CL and LITEC are listed below are listed below.

resolv.conf from a host in CL (IISc Computing Lab)

nameserver 144.16.67.16

This gives the IP address of a single name server and askes the resolver to use the local domain name "csa.iisc.ernet.in" to complete incomplete (relative) domain names. The local domain name is determined from the local host name returned by "gethostname".

resolv.conf from a host in LITEC (INTEL Laboratory for Internet Technologies and Electronic Commerce)

search csa.iisc.ernet.in nameserver 144.16.67.13 nameserver 144.16.64.3

This gives the IP address of a two name servers and explicitly specifies "csa.iisc.ernet.in" as the domain to use for completing incomplete (relative) domain names.

The resolver also uses the "/etc/host.conf" file which contains configuration information specific to the resolver library. It is commanly used to specify the order in which lookups are to be performed. The resolver may be asked to look into the "/etc/hosts" file (described next) before querying the name servers. Its man page can be viewed by

# man host.conf

A sample host.conf file is

order hosts,bind

This instructs the resolver to look in the "/etc/hosts" file before querying the name server.

The resolver may use the "/etc/hosts" file, that is simple text file that associates IP addresses with hostnames. This file usually contains the hostname to IP address mappings of the local hosts. Its use is expalined above. The appropriate man page can be viewed by

# man hosts

A few lines from the "/etc/hosts" file in LITEC are listed below

   #......................
   144.16.67.16    europa.csa.iisc.ernet.in europa
   144.16.67.21    church.csa.iisc.ernet.in church pgsqldb mysqldb
   144.16.67.22    herbrand.csa.iisc.ernet.in herbrand
   144.16.67.23    hilbert.csa.iisc.ernet.in hilbert
   144.16.67.24    erdos.csa.iisc.ernet.in erdos
   144.16.67.26    garnet.csa.iisc.ernet.in garnet
   #.....................

Tools to query name server in Linux

The dig(domain information groper) , nslookup and host commands are available for querying name servers from the command line. Their man pages may be viewed for more information.

When dig was queried for www.google.com, it gave the following results

   $ dig www.google.com

   ; <<>> DiG 9.2.1 <<>> www.google.com
   ;; global options:  printcmd
   ;; Got answer:
   ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10840
   ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 9, ADDITIONAL: 9

   ;; QUESTION SECTION:
   ;www.google.com.                        IN      A

   ;; ANSWER SECTION:
   www.google.com.         1036    IN      CNAME   www.google.akadns.net.
   www.google.akadns.net.  184     IN      A       216.239.57.99

   ;; AUTHORITY SECTION:
   akadns.net.             30092   IN      NS      zc.akadns.net.
   akadns.net.             30092   IN      NS      zf.akadns.net.
   akadns.net.             30092   IN      NS      zh.akadns.net.
   akadns.net.             30092   IN      NS      ns1-159.akam.net.
   akadns.net.             30092   IN      NS      use2.akam.net.
   akadns.net.             30092   IN      NS      usw5.akam.net.
   akadns.net.             30092   IN      NS      use4.akam.net.
   akadns.net.             30092   IN      NS      asia3.akam.net.
   akadns.net.             30092   IN      NS      a-93.akadns.net.

   ;; ADDITIONAL SECTION:
   zc.akadns.net.          22715   IN      A       63.241.199.50
   zf.akadns.net.          22715   IN      A       63.215.198.79
   zh.akadns.net.          22715   IN      A       63.208.48.42
   ns1-159.akam.net.       22715   IN      A       193.108.91.159
   use2.akam.net.          18022   IN      A       63.209.170.136
   usw5.akam.net.          22505   IN      A       63.241.73.214
   use4.akam.net.          10888   IN      A       80.67.67.182
   asia3.akam.net.         29884   IN      A       193.108.154.9
   a-93.akadns.net.        22715   IN      A       193.108.91.93

   ;; Query time: 19 msec
   ;; SERVER: 144.16.64.2#53(144.16.64.2)
   ;; WHEN: Thu Nov 17 00:17:56 2003
   ;; MSG SIZE  rcvd: 415

From the above information, we see that the canonical name for www.google.com is actually www.google.akadns.net.. DIG also gives us the IP address of the canonical name. Additonally, we are also provided the list of name servers for akadns.net. and their IP addresses. The name server used for the query was 44.16.64.2.

What port numbers are used for name server access ?

The Internet supports name server access using TCP on server port 53 (decimal) as well as datagram access using UDP on port 53 (decimal).

Further References

If you require a more detailed understanding of the concepts explained above, visit the links given below

Back to Network Programming

Contents