|
|
Defining the Internet is a bit like a conversation with Socrates. Its safe to say that the Internet is not a static entity but a complicated matrix of connections in a constant rate of upgrade. The Internet is not a single monolithic network. It comprises an eclectic series of networks all owned by different operators and all different in some way. This vast entanglement of networks is just the tip of the Internet. To put it in simple terms the Internet can be simply be described as a series of networks linked to each other. Nobody knows for sure how big the Internet actually is, it is approximated that thirty to forty million people are on the net at any given point of time and this is increasing at a rate of ten percent each month. The basic building blocks of the Internet are four these are Servers, Routers, Connections and Clients. The basic of Internet and the topics studied are as under. Origin. The Internet started as an military project which was conceived with the aim to build a fail proof network capable of with standing a limited amount of destruction. The project was funded and operated by the Advanced Research Project Agency (ARPA) and the whole project was called the ARPANET. This network started with linking of major universities and important military installations in the USA . The development of TCP/IP as a packet switched protocol was responsible for the reach of the Internet. This protocol allowed the computers to communicate across various mediums like optical fiber, radio links , sea cables and satellites. The "Seed" of the Internet was the link set up by the National Science Foundation to various universities so as to allow them the access to the supercomputer centers it operated. These links were 56 Kbps links running TCP/IP. History. The history of the Internet, as of now, can be divided into three main parts. 1. FTP (File Transfer Protocol). The FTP was (and is still) used to transfer files from one computer to the other. A user logs in at an FTP server and download the required file. Though FTP allowed for sending and retrieving files from a remote computer, it does not facilitate browsing. Thus, a lot of time is spent in searching for the required information. Nowadays, FTP is mainly used to transfer large data (huge files or many small files) from one machine to the other. Various FTP clients are now available, most of which also have a graphical user interface. a. Archie. As the Internet spread anonymous FTP sites, proliferated everywhere in near-total anarchy. A service called Archie was developed to simplify keyword searching of files located at FTP servers.. Archie is not free software, though it's client implementations are free and widely available. Among these are Web interfaces such as ArchiePlex. 2. Gopher. Documented in RFC 1436, is a non-hypertext precursor to the World Wide Web based on navigating a series of menus. Unlike anonymous FTP, Gopher permits links to be constructed from one gopher site to another. The advent of the Web, with its graphical, hypertext presentation, has rendered Gopher largely obsolete. Gopher was a menu-style information browsing and retrieval system. Developed at the University of Minnesota as a campus wide information system, Gopher was named after the University mascot; though some opine that Gopher stands for 'go-for' information. Gopher overcame many of FTP's shortcomings however as the content increased menu navigation became arduous. a. Veronica. A search facility for Gopher called Veronica was developed which was similar to Archie for FTP. Jughead, a local search service for Gopher was developed to facilitate searching of local networks. Due to the lack of multimedia support and its linear nature, Gopher soon became extinct with the advent of the Web. 3. The World Wide Web. Came into existence with the introduction of browsers and the first was Mosaic. The browser provided ease of use with graphical display and was able to handle pictures. Hyper linking between documents broke away from the linear architecture of Gopher and increased the complexity of the web. The browser was able to provide the user with a range of experiences staring with pictures to multimedia (sound, video) and interactivity. The web also allowed for the integration of pages with databases to provide dynamically generated content. The Web has two main components - a. The HTML language used to describe web pages. b. The HTTP protocol used to transfer HTML across the net.
Important Technologies. HTTP (HyperText Transfer Protocol) Computers on the World Wide Web use the Hyper Text Transfer Protocol to talk with each other. The HTTP provides a set of instructions for accurate information exchange. The communication between the client (browser) and the server (a software located on a remote computer) involves requests sent by the client and responses from the server. Each client-server transaction, whether a request or a response, consists of three main parts 1. A response or request line A client connects to the server at port 80 (unless its been changed by the system administrator) and sends in its request. The request line from the client consists of a request method, the address of the file requested and the HTTP version number. 2. Header information After the request line comes the header data which consists of configuration information about the client and its document viewing preferences. The header is a series of lines, each of which contains a specific detail about the client. The header ends with a blank line. 3. The body The server now responds. Again, the response consists of three parts.The response line contains information on the HTTP version number, a status code that indicates the result of the request from the client and a description of the status code in 'English'. .The header is followed by a blank line that indicates the end of the header information. The server data sent is shown by the Content-type and Content-length lines. The server line gives details about the server software The HTTP is a stateless protocol, which means that the connection between the browser and the server is lost once the transaction ends. MIME ( Multipurpose Internet Mail Extensions). Documented in RFC 1521 and RFC 1522, defines the standard representation for "complex" message bodies. A "complex" message body doesn't conform to the default of a single, human-readable, ASCII mail message. Examples of "complex" message bodies include messages with embedded graphics or audio clips, messages with file attachments, messages in Japanese or Russian, or signed messages. MIME defines several new header field - Mime-Version (identifying a MIME document), Content - Type, Content -Transfer - Encoding. The most interesting of these is Content-Type, which defines the content of the document, and comes in seven pre-defined types, each of which have subtypes. An extension mechanism exists for defining new types and subtypes. The Content - Transfer - Encoding defines several encoding mechanisms for binary data that may otherwise be difficult to transport. Email Transport Once email message headers have been parsed and recipient addresses identified, the message must actually be delivered across the Internet. This is done using DNS and the SMTP protocol. Thus email sent will trigger a DNS lookup. The lookup will be first for an MX (mail exchanger) record, then a normal A (address) lookup if this fails. MX records contain a preference value (to chose among multiple MX records for a particular domain), and the DNS name of the host receiving mail for this domain. This procedure is documented in RFC 974. Messages for a particular user will typically be placed in a mailbox, which then accessed by the user using protocols such as POP and IMAP. POP (Post Office Protocol (POP-3)). Documented in RFC 1725, is designed for user-to-mailbox access. Facilities are provided for user authentication and mailbox manipulation. Authentication takes the form of a password transmitted as clear text, IMAP (Internet Message Access Protocol (IMAP)). Documented in RFC 2060, represents an improvement over POP. SMTP (Simple Mail Transfer Protocol). Documented in RFC 821, is Internet's standard host-to-host mail transport protocol and traditionally operates over TCP, port 25. In other words, a UNIX user can type telnet hostname 25 and connect with an SMTP server, if one is present. SMTP uses a style of asymmetric request-response protocol popular in the early 1980s, and still seen occasionally, most often in mail protocols. If mail delivery fails, sendmail (the most important SMTP implementation) will queue mail messages and retry delivery later. However, a backoff algorithm is used, and no mechanism exists to poll all Internet hosts for mail, nor does SMTP provide any mailbox facility, or any special features beyond mail transport. For these reasons, SMTP isn't a good choice for hosts situated behind highly unpredictable lines (like modems). Domain Names. Internet domains form the basis of the common Internet naming scheme. Domains are structured in the form of an inverted tree. Each branch or leaf on the tree is labeled with an simple alphanumeric string, and a complete domain name is written by stringing all the labels together, separated by periods. The root domain is com, the second level label is iisc, and the third level is www. Incidentally, there is no standard way to visually distinguish a branch from a leaf. In fact, the Internet domain system makes no distinction between the two, since branches can have any attributes of a leaf, and leaves can have children added to them and become branches. Generic Domains Country Domains (partial list) com - Commercial edu - Educational org - Non-profit Organizations net - Networking Providers mil - Military gov - Government int - International Organizations To make use of domain names, they must be converted into 32-bit IP Addresses. This is done using the DNS Protocol. Domain name assignment is completely distinct from IP address assignment. DNS (Domain Name Service). Domain naming, and its most visible component, the Domain Name Service (DNS), is critical to the operation of the Internet. It converts the name to the 32 bit IP address of the location of the server where the URL is located. The crucial DNS documentation is provided in RFC 1034 and RFC 1035. IP (Internet Protocol). IP is the Internet's most basic protocol. In order to function in a TCP/IP network, a network segment's only requirement is to forward IP packets. In fact, a TCP/IP network can be defined as a communication medium that can transport IP packets. Almost all other TCP/IP functions are constructed by layering atop IP. IP is documented in RFC 791, and IP broadcasting procedures are discussed in RFC 919. IP is a datagram-oriented protocol, treating each packet independently. This means each packet must contain complete addressing information. Also, IP makes no attempt to determine if packets reach their destination or to take corrective action if they do not. Nor does IP checksum the contents of a packet, only the IP header. TCP (Transmission Control Protocol ). Documented in RFC 793, makes up for IP's deficiencies by providing reliable, stream-oriented connections that hide most of IP's shortcomings. The protocol suite gets its name because most TCP/IP protocols are based on TCP, which is in turn based on IP. TCP and IP are the twin pillars of TCP/IP. Hypertext. Hypertext is most simply a way of constructing documents that reference other documents. Within a hypertext document, a block of text can be tagged as a hypertext link pointing to another document. When viewed with a hypertext browser, the link can be activated to view the other document. Hypertext's original idea was to take advantage of electronic data processing to organize large quantities of information that would otherwise overwhelm a reader. Search Engines Google. What Google means? Google is a play on the word googol, which was coined by Milton Sirotta, nephew of American mathematician Edward Kasner, to refer to the number represented by the numeral 1 followed by 100 zeros. A googol is a very large number. There isn't a googol of anything in the universe. Not stars, not dust particles, not atoms. Google's use of the term reflects the company's mission to organize the immense, seemingly infinite amount of information available on the web. Links: http://www.isoc.org/internet/http://www.ietf.org/rfc.html http://www.cis.ohio-state.edu/~jain/ |