Home : Network Programming |
The Domain Name System |
Every computer on the Internet has an IP address (which is similar to the telephone number) because the internet infrastrucutre only understands IP addresses. IP addresses are 32 bit numbers usually represented as four numbers separated by dots. The good thing is that when we access the web, we specify the domain name instead of the IP address. For example, the URL "http://www.google.com" contains the domain name www.google.com. Imagine how difficult it would be if we had to specify 216.239.41.99 (the IP address) instead of www.google.com (the domain name). Not satisfied yet, imagine having to remember IP addresses of each website you access.
The Domain Name System (DNS) is the system that allows us to use a familiar string of letters (the "domain name") instead of the arcane IP address. The DNS translates the domain name into the IP address behind our backs and makes life easy for us. It also makes sure that the email that you send to someone@hotmail.com reaches the correct IP address.
The DNS is a hierarchical structure, somwhat like an inverted tree. At the very top is the root, below which are the various Top Level Domains (TLD) including org, com, net, edu, info, biz etc and the country-level domains like in (India),fi (Finland), ca (Canada) etc . Below this are various second-level domains like yahoo, hotmail, ernet, google etc. There may be may be upto 127 levels. The levels are separated by a dot ".". The lower most level (the one farthest away from the root) is mention first. "people.csa.iisc.ernet.in" is an example of a 5 level domain name.
This tree structure is divided into "zones", each of which has its own "name server". Name servers are computers which remember the host name to IP address mappings of all the host in a zone. They also know the address of name servers for their "sub domains". So in essence, the name server for a particular zone has information that can lead you to any host that lies below it in the tree. If it does not have the IP address of that host, it can give you the address of a name server that may have it. At the heart of the DNS are the 13 root servers which are the name servers of the TLD. The servers are uniquely named A, B, C and so on up to M. Ten of these servers reside in the United States, one in Japan, one in London, and one in Stockholm, Sweden. Every name server knows the IP addresses of the root servers.
Every host computer has a program called a "resolver" that takes the domain name from the user and converts it into an IP address. The resolver has the IP address of a few DNS name servers which it can query. Let us say you wanted to find the IP address of www.google.com, you would ask your resolver which in turn would ask one of the known name server.
Do you know the IP address of www.google.com ?
Assuming that the name server does not know the answer, it would reply ?
I don't know the IP address of www.google.com, but you could ask the root server, here is its address.
The resolver would now ask the root server the same question.
Do you know the IP address of www.google.com ?
The root server would reply.
I don't know the IP address of www.google.com but you could ask name server for the com domain, here is its address.
The resolver would now query the name server for the com domain. The name server would reply.
I don't know the IP address of www.google.com but you could ask name server for the google domain, here is its address.
The resolver would now query the google name server which would reply
I know the IP address of www.google.com, it is 216.239.41.99.
The resolver would return this IP address to the user and would cache it in its local memory so that it doesn't have to ask these many questions if the user wants to know the same IP address again. We see that if one DNS server doesn't know how to translate a particular domain name, it asks another one, and so on, until the correct IP address is returned.
The DNS has three major components:
The domain name space is a tree structure. Each node on the tree has a label (Node referes to both interior nodes and leaves). The domain name of a node is the list of the labels on the path from the node to the root of the tree. By convention, the labels that compose a domain name are printed or read left to right, from the most specific (lowest, farthest from the root) to the least specific (highest, closest to the root). When a user needs to type a domain name, the labels are separated by dots (".").
Domain name comparisons for all present domain functions are done in a case-insensitive manner. That is, two names with the same spelling but different case are to be treated as if identical.
We distinguish between absolute and relative domain names:
A domain name identifies a node each of which has a set of resource information associated with it. The set of resource information associated with a particular name is composed of separate resource records (RRs)
Each Resource Record (RR) has the following:
Owner | This field defines what domain name applies to the given RR. | |
Type | A value that specifies the type of data/resource that is there in this resource record. The possible values and their meanings are: | |
A | A host IP address | |
CNAME | Identifies its owner name as an alias and gives its canonical name | |
HINFO | identifies the CPU and OS used by a host | |
MX | identifies a mail exchange for the domain (the machine to which mail for a domain name should be delivered). | |
NS | The authoritative name server for the domain | |
PTR | A pointer to another part of the domain name space | |
SOA | The Start Of Authority record designates the start of a zone. The zone ends at the next SOA record. | |
Class | A value which identifies a protocol family. The possible values are: | |
IN | the Internet system | |
CH | the Chaos system | |
TTL | TTL stands for Time To Live. It specifies how long a domain resolver should cache the RR before it throws it out and asks a domain server again. | |
RDATA | The actual data stored in the resource record. The data field is defined differently for each type and class of data. The data stored for the various types is : | |
A | For the IN class, a 32 bit IP address in dotted decimal form. | |
CNAME | The canonical domain name corresponding to this alias. | |
HINFO | The HINFO record gives information about a particular host. The data is two strings separated by whitespace. The first string is a hardware description and the second is software. The hardware is usually a manufacturer name followed by a dash and model designation. The software string is usually the name of the operating system. | |
MX | a 16 bit preference value (lower is better) followed by a host name willing to act as a mail exchange for the owner domain. | |
NS | A host name of a machine that provides domain service for this particular domain | |
PTR | a domain name. They are mainly used for translation of IP addresses to names. | |
SOA | contains several fields giving details about the zone. The fields are.
|
In existing systems, hosts and other resources often have several names that identify the same resource. For example, the domain names "cl.csa.iisc.ernet.in", "people.csa.iisc.ernet.in" and "deimos.csa.iisc.ernet.in" refer to a single machine. Most of these systems have a notion that one of the equivalent set of names is the canonical or primary name and all others are aliases. For example, in the above case, "deimos.csa.iisc.ernet.in" is the canonical name while the other two are aliases.
The domain system provides such a feature using the canonical name (CNAME) RR. A CNAME RR identifies its owner name as an alias, and specifies the corresponding canonical name in the RDATA section of the RR. If a CNAME RR is present at a node, no other data should be present; this ensures that the data for a canonical name and its aliases cannot be different.
Queries are messages which may be sent to a name server to provoke a response. The user does not generate queries directly, but instead makes a request to a resolver which in turn sends one or more queries to name servers and deals with the error conditions and referrals that may result.
DNS queries and responses are carried in a standard message format. The message format has a header containing a number of fixed fields which are always present, and four sections which carry query parameters and RRs.
The four sections are:
Question | Carries the query name and other query parameters. |
Answer | Carries RRs which directly answer the query. |
Authority | Carries RRs which describe other authoritative servers. May optionally carry the SOA RR for the authoritative data in the answer section. |
Additional | Carries RRs which may be helpful in using the RRs in the other sections. |
Name servers are the repositories of information that make up the domain database. The database is divided up into sections called zones, which are distributed among the name servers. A given zone will be available from several name servers to insure its availability in spite of host or communication link failure. It is required that every zone to be available on at least two servers. The "csa.iisc.ernet.in" zone has 4 name servers "ece.iisc.ernet.in","drona.csa.iisc.ernet.in", "csa.iisc.ernet.in" and "iisc.ernet.in". Notice that "csa.iisc.ernet.in" is a zone and also a host name.
A given name server will typically support one or more zones. For example, "ece.iisc.ernet.in" is a name server for "iisc.ernet.in" as well as "csa.iisc.ernet.in".
The data that describes a zone has four major parts:
When some organization wants to control its own domain, the first step is to identify the proper parent zone, and get the parent zone's owners to agree to the delegation of control.
Once the proper name for the new subzone is selected, the new owners are required to demonstrate redundant name server support.Note that there is no requirement that the servers for a zone reside in a host which has a name in that domain. In many cases, a zone will be more accessible to the internet at large if its servers are widely distributed rather than being within the physical facilities controlled by the same organization that manages the zone.
As the last installation step, the delegation NS RRs and glue RRs necessary to make the delegation effective should be added to the parent zone. The administrators of both zones should insure that the NS and glue RRs which mark both sides of the cut are consistent and remain so.
There are two modes in which a name server can operate, recursive and non-recursive. The way that the name server answers a query depends upon the mode in which it is operating.
A resolver is a program that extracts information from name servers in response to client requests. The resolver is located on the same machine as the program that requests the resolver's services, but it may need to consult name servers on other hosts.
A very important goal of the resolver is to eliminate network delay and name server load from most requests by answering them from its cache of prior results.
One option for implementing a resolver is to move the resolution function out of the local machine and into a name server which supports recursive queries.
Let us see the different name servers, starting from the root, that will be queried while resolving "people.csa.iisc.ernet.in" .
Root server: A.ROOT-SERVERS.NET , IP address - 198.41.0.4
this returns the name servers for "in" zone
"in" Name Server: naamak.ncst.ernet.in, IP address - 202.41.110.66
this returns the name servers for "iisc" zone because it is also the authoritative server for "ernet" zone.
"iisc" Name Server: ece.iisc.ernet.in, IP address -144.16.64.2
returns the name servers for csa zone
"csa" Name Server: csa.iisc.ernet.in, IP address - 144.16.67.8
returns the fact that people.csa.iisc.ernet.in is an alias for deimos and that the IP address for deimos is 144.16.67.57
The resolver is a set of routines in the C library that provide access to the Internet Domain Name System.
One of the resolver configuration file in linux is "/etc/resolv.conf". The man page giving details about the format of the resolv.conf file can be seen by typing the following command
# man 5 resolver
The configuration file contains a list of name server that the resolver should query and the domains that will be searched if a short/relative domain name is specified. By default, the local domain name is used to complete relative domain names.
Sample resolv.conf files taken from hosts in CL and LITEC are listed below are listed below.
resolv.conf from a host in CL (IISc Computing Lab)
nameserver 144.16.67.16
This gives the IP address of a single name server and askes the resolver to use the local domain name "csa.iisc.ernet.in" to complete incomplete (relative) domain names. The local domain name is determined from the local host name returned by "gethostname".
resolv.conf from a host in LITEC (INTEL Laboratory for Internet Technologies and Electronic Commerce)
search csa.iisc.ernet.in
nameserver 144.16.67.13
nameserver 144.16.64.3
This gives the IP address of a two name servers and explicitly specifies "csa.iisc.ernet.in" as the domain to use for completing incomplete (relative) domain names.
The resolver also uses the "/etc/host.conf" file which contains configuration information specific to
the resolver library. It is commanly used to specify the order in which lookups are to be performed. The
resolver may be asked to look into the "/etc/hosts" file (described next) before querying the name servers.
Its man page can be viewed by
# man host.conf
A sample host.conf file is
order hosts,bind
This instructs the resolver to look in the "/etc/hosts" file before querying the name server.
The resolver may use the "/etc/hosts" file, that is simple text file that associates IP addresses with hostnames.
This file usually contains the hostname to IP address mappings of the local hosts. Its use is expalined above.
The appropriate man page can be viewed by
# man hosts
A few lines from the "/etc/hosts" file in LITEC are listed below
#...................... 144.16.67.16 europa.csa.iisc.ernet.in europa 144.16.67.21 church.csa.iisc.ernet.in church pgsqldb mysqldb 144.16.67.22 herbrand.csa.iisc.ernet.in herbrand 144.16.67.23 hilbert.csa.iisc.ernet.in hilbert 144.16.67.24 erdos.csa.iisc.ernet.in erdos 144.16.67.26 garnet.csa.iisc.ernet.in garnet #.....................
The dig(domain information groper) , nslookup and host commands are available for querying name servers from the command line. Their man pages may be viewed for more information.
When dig was queried for www.google.com, it gave the following results
$ dig www.google.com ; <<>> DiG 9.2.1 <<>> www.google.com ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10840 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 9, ADDITIONAL: 9 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 1036 IN CNAME www.google.akadns.net. www.google.akadns.net. 184 IN A 216.239.57.99 ;; AUTHORITY SECTION: akadns.net. 30092 IN NS zc.akadns.net. akadns.net. 30092 IN NS zf.akadns.net. akadns.net. 30092 IN NS zh.akadns.net. akadns.net. 30092 IN NS ns1-159.akam.net. akadns.net. 30092 IN NS use2.akam.net. akadns.net. 30092 IN NS usw5.akam.net. akadns.net. 30092 IN NS use4.akam.net. akadns.net. 30092 IN NS asia3.akam.net. akadns.net. 30092 IN NS a-93.akadns.net. ;; ADDITIONAL SECTION: zc.akadns.net. 22715 IN A 63.241.199.50 zf.akadns.net. 22715 IN A 63.215.198.79 zh.akadns.net. 22715 IN A 63.208.48.42 ns1-159.akam.net. 22715 IN A 193.108.91.159 use2.akam.net. 18022 IN A 63.209.170.136 usw5.akam.net. 22505 IN A 63.241.73.214 use4.akam.net. 10888 IN A 80.67.67.182 asia3.akam.net. 29884 IN A 193.108.154.9 a-93.akadns.net. 22715 IN A 193.108.91.93 ;; Query time: 19 msec ;; SERVER: 144.16.64.2#53(144.16.64.2) ;; WHEN: Thu Nov 17 00:17:56 2003 ;; MSG SIZE rcvd: 415
From the above information, we see that the canonical name for www.google.com is actually www.google.akadns.net.. DIG also gives us the IP address of the canonical name. Additonally, we are also provided the list of name servers for akadns.net. and their IP addresses. The name server used for the query was 44.16.64.2.
The Internet supports name server access using TCP on server port 53 (decimal) as well as datagram access using UDP on port 53 (decimal).
If you require a more detailed understanding of the concepts explained above, visit the links given below
Back to Network Programming |