Contents
Emails are exchanged between hosts on the internet by using SMTP. While transferring the
email message, the sender SMTP specifies the address of the sender, the receiver and the data to be sent. The data is arbitrary
ASCII text. SMTP does not interpret the data in any way or specify a format for it.
During the early days of email, a need was felt for a header in the data which would contain information such as
the Subject, Date when the mail was sent etc. Consequently, several informal standards were developed by individuals
leading to incomaptibilities. It was felt necessary to codify these practices and provide for those features that seemed imminent.
This resulted in RFC 733 which was later updated by
RFC 822- Standard for the Format of ARPA
Internet Text Messages which specifies a standard set of message headers which are followed by the message content.
The problem with RFC 822 is that it allows message content consisting of only
ASCII text. MIME or Multipurpose Internet Mail Extensions
overcome this limitation and allow messages containing character sets other than ASCII, non-textual data (attachments), multi-part messages etc.
A message consists of header fields and, optionally, a body. The body is simply a sequence of lines containing ASCII characters.
It is separated from the headers by a null line. Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name followed by a colon (":"), followed by a field-body. Depending on the field-name,
the field body may be Structured or Unstructured.
- Structured - For address fields, such as "To", a structure is predefined. The field body must
conform to the specification.
- Unstructured - For some fields, such as "Subject" and "Comments", no structuring is assumed, and they are treated simply as text.
Not all header fields received by the receiver may have been specified by the sender. There is a distinction between "message" and "envelope"
headers. Briefly, the "envelope" headers are actually generated by the
machine that receives a message, rather than by the sender. The envelope headers are added at the beginning of the mail data. The SMTP
relay servers that handle the message on the way to the final receipient insert some header fields into the message header. For example:
- When the receiver-SMTP accepts a message either for relaying or for final delivery it inserts at the beginning of the mail data a
time stamp line (Recieved: header). The time stamp line indicates the identity of the host that sent the message, and the identity of the host that received
the message (and is inserting this time stamp), and the date and time the message was received. Relayed messages will have multiple time
stamp lines. When the receiver-SMTP makes the "final delivery" of a message it inserts at the beginning of the mail data a return path line.
- Some systems permit mail recipients to forward a message. This standard supports such a service, through the "Resent-" prefix to
field names. Whenever the string "Resent-" begins a field name, the field
has the same semantics as a field whose name does not have the
prefix. However, the message is assumed to have been forwarded
by an original recipient who attached the "Resent-" field. This
new field is treated as being more recent than the equivalent,
original field.
In addition to the the predefined header field, users are allowed to define and use their own header fields.
These fields may be used for transferring application specific information. Such fields must have names
which are not already used in the current specification. The names of user defined fields usually begin with "X-"
because it is guaranteed that predefined fields will never have names beginning with this string.
Some of the header fields specified by the standard are listed below:
(From the article Reading Email Headers,
by Nathan Tenny of Qualcomm (the publishers of Eudora email software))
- Apparently-To: Messages with many recipients sometimes have a long
list of headers of the form "Apparently-To: gaurav@iisc.ernet.in" (one line per
recipient). These headers are unusual in legitimate mail; they are normally
a sign of a mailing list, and in recent times mailing lists have generally used
software sophisticated enough not to generate a giant pile of headers.
- Bcc: (stands for "Blind Carbon Copy") If you see this header on
incoming mail, something is wrong. It's used like Cc: (see below), but does
not appear in the headers. The idea is to be able to send copies of
email to persons who might not want to receive replies or to appear in the
headers. Blind carbon copies are popular with spammers, since it confuses
many inexperienced users to get email that doesn't appear to be addressed to
them.
- Cc: (stands for "Carbon Copy", which is meaningful if you remember
typewriters) This header is sort of an extension of "To:"; it specifies
additional recipients. The difference between "To:" and "Cc:" is essentially
connotative; some mailers also deal with them differently in generating
replies.
- Comments: This is a nonstandard, free-form header field. It's most
commonly seen in the form "Comments: Authenticated sender is
<gaurav@iisc.ernet.in>". A header like this is added by some mailers
(notably the popular freeware program Pegasus) to identify the sender; however,
it is often added by hand (with false information) by spammers as well. Treat
with caution.
- Content-Transfer-Encoding: This header relates to MIME, a standard
way of enclosing non-text content in email. It has no direct relevance to the
delivery of mail, but it affects how MIME-compliant mail programs interpret the
content of the message.
- Content-Type: Another MIME header, telling MIME-compliant mail
programs what type of content to expect in the message.
- Date: This header does exactly what you'd expect: It specifies a
date, normally the date the message was composed and sent. If this header is
omitted by the sender's computer, it might conceivably be added by a mail
server or even by some other machine along the route. It shouldn't be treated
as gospel truth; forgeries aside, there are an awful lot of computers in the
world with their clocks set wrong.
- Errors-To: Specifies an address for mailer-generated errors, like
"no such user" bounce messages, to go to (instead of the sender's address).
This is not a particularly common header, as the sender usually wants to
receive any errors at the sending address, which is what most (essentially all)
mail server software does by default.
- From (without colon) This is the "envelope From" header discussed above.
This header is derived from the information in the SMTP "MAIL FROM" command. This header
is added by the final receipient of the message.
- From: (with colon) This is the "message From:" header. This header
is specified by the sender in the SMTP "DATA" command. The sender can specify any value
in this header so it is not reliable.
- Message-Id: (also Message-id: or Message-ID:) The Message-Id is a
more-or-less unique identifier assigned to each message, usually by the first
mailserver it encounters. Conventionally, it is of the form
"gibberish@csa.iisc.ernet.in", where the "gibberish" part could be absolutely
anything and the second part is the name of the machine that assigned the ID.
Sometimes, but not often, the "gibberish" includes the sender's username. Any
email in which the message ID is malformed (e.g., an empty string or no @
sign), or in which the site in the message ID isn't the real site of origin, is
probably a forgery.
- In-Reply-To: A Usenet header that occasionally appears in mail, the
In-Reply-To: header gives the message ID of some previous message which is
being replied to. It is unusual for this header to appear except in email
directly related to Usenet; spammers have been known to use it, probably in an
attempt to evade filtration programs.
- Mime-Version: (also MIME-Version:) Yet another MIME header, this
one just specifying the version of the MIME protocol that was used by the
sender. Like the other MIME headers, this one is usually eminently ignorable;
most modern mail programs will do the right thing with it.
- Newsgroups: This header only appears in email that is connected
with Usenet---either email copies of Usenet postings, or email replies to
postings. In the first case, it specifies the newsgroup(s) to which the
message was posted; in the second, it specifies the newsgroup(s) in which the
message being replied to was posted. The semantics of this header are the
subject of a low-intensity holy war, which effectively assures that both sets
of semantics will be used indiscriminately for the foreseeable future.
- Organization: A completely free-form header that normally contains
the name of the organization through which the sender of the message has net
access. The sender can generally control this header, and silly entries like
"Royal Society for Putting Things on Top of Other Things" are commonplace.
- Priority: An essentially free-form header that assigns a priority
to the mail. Most software ignores it. It is often used by spammers, usually
in the form "Priority: urgent" (or something similar), in an attempt to get
their messages read.
- Received:This header is inserted by the receiver-SMTP when it accepts a message either for
relaying or for final delivery. This header indicates the identity of the host that sent the message, and the identity of the host that received
the message (and is inserting this header), and the date and time the message was received.
Relayed messages will have multiple such headers.
- References: The References: header is rare in email except for
copies of Usenet postings. Its use on Usenet is to identify the "upstream"
posts to which a message is a response; when it appears in email, it's usually
just a copy of a Usenet header. It may also appear in email responses to
Usenet postings, giving the message ID of the post being responded to as well
as the references from that post.
- Reply-To: Specifies an address for replies to go to. Though this
header has many legitimate uses (perhaps your software mangles your From:
address and you want replies to go to a correct address), it is also widely
used by spammers to deflect criticism. Occasionally a naive spammer will
actually solicit responses by email and use the Reply-To: header to collect
them, but more often the Reply-To: address in junk email is either invalid or
an innocent victim.
- Sender: This header is unusual in email (X-Sender: is usually used
instead), but appears occasionally, especially in copies of Usenet posts. It
should identify the sender; in the case of Usenet posts, it is a more reliable
identifier than the From: line.
- Subject: A completely free-form field specified by the sender,
intended, of course, to describe the subject of the message.
- To: The "message To: " header. This is inserted by the sender in the mail data.
Note that the To: header need not contain the recipient's address!
- X-headers is the generic term for headers starting with a capital
X and a hyphen. The convention is that X-headers are nonstandard and provided
for information only, and that, conversely, any nonstandard informative header
should be given a name starting with "X-". This convention is frequently
violated.
- X-Confirm-Reading-To: This header requests an automated
confirmation notice when the message is received or read. It is typically
ignored; presumably some software acts on it.
- X-Distribution: In response to problems with spammers using his
software, the author of Pegasus Mail added this header. Any message sent with
Pegasus to a sufficiently large number of recipients has a header added that
says "X-Distribution: bulk". It is explicitly intended as something for
recipients to filter against.
- X-Errors-To: Like Errors-To:, this header specifies an address for
errors to be sent to. It is probably less widely obeyed.
- X-Mailer: (also X-mailer:) A freeform header field intended for the
mail software used by the sender to identify itself (as advertising or
whatever). Since much junk email is sent with mailers invented for the
purpose, this field can provide much useful fodder for filters.
- X-Priority: Another priority field, used notably by Eudora to
assign a priority (which appears as a graphical notation on the message).
- X-Sender: The usual email analogue to the Sender: header in Usenet
news, this header purportedly identifies the sender with greater reliability
than the From: header. In fact, it is nearly as easy to forge, and should
therefore be viewed with the same sort of suspicion as the From: header.
- X-UIDL: This is a unique identifier used by the POP protocol for
retrieving mail from a server. It is normally added between the recipient's
mail server and the recipient's actual mail software; if mail arrives at the
mail server with an X-UIDL: header, it is probably junk (there's no conceivable
use for such a header, but for some unknown reason many spammers add one).
A sample email header from a mail I sent to myself is given below:
From malhotra_g@vsnl.net Fri Nov 7 19:17:53 2003
Return-Path: <malhotra_g@vsnl.net>
Delivered-To: gaurav@deimos.csa.iisc.ernet.in
Received: from csa.iisc.ernet.in (csa.iisc.ernet.in [144.16.67.8])
by deimos.csa.iisc.ernet.in (Postfix) with ESMTP id 8EF794E662
for <gaurav@deimos.csa.iisc.ernet.in> ; Fri, 7 Nov 2003 19:17:53 +0530 (IST)
Received: by csa.iisc.ernet.in (Postfix)
id 30AA52BDF9; Fri, 7 Nov 2003 18:57:32 +0530 (IST)
Delivered-To: gaurav@csa.iisc.ernet.in
Received: from smtp2.vsnl.net (smtp2.vsnl.net [203.200.235.232])
by csa.iisc.ernet.in (Postfix) with ESMTP id D45FA2BC9F
for <gaurav@csa.iisc.ernet.in> ; Fri, 7 Nov 2003 18:57:31 +0530 (IST)
Received: from vsnl.net ([127.0.0.1])
by smtp2.vsnl.net (iPlanet Messaging Server 5.2 HotFix 1.16 (built May 14
2003)) with ESMTP id <0HNZ00HE9I8UA7@smtp2.vsnl.net> for
gaurav@csa.iisc.ernet.in; Fri, 07 Nov 2003 19:16:06 +0530 (IST)
Received: from ([172.16.28.141])
by smtp2.vsnl.net (InterScan E-Mail VirusWall Unix); Fri,
07 Nov 2003 19:16:06 +0530 (IST)
Received: from [172.16.28.182] by pop2.vsnl.net (mshttpd); Fri,
07 Nov 2003 18:46:06 +0500
Date: Fri, 07 Nov 2003 18:46:06 +0500
From: malhotra_g@vsnl.net
Subject: Test Mail
To: gaurav@csa.iisc.ernet.in
Message-id: <4d318d4d0507.4d05074d318d@vsnl.net>
X-Mailer: iPlanet Messenger Express 5.2 HotFix 1.16 (built May 14 2003)
Content-type: text/plain; charset=us-ascii
Content-language: en
Content-transfer-encoding: 7BIT
Content-disposition: inline
X-Accept-Language: en
Priority: normal
Status: RO
X-Status:
X-Keywords:
X-UID: 218
MIME extends the format of Internet mail, as specified by RFC 822, to allow non-US-ASCII textual messages,
non-textual messages, multipart message bodies, and non-US-ASCII information in message headers.
Specifically, MIME messages can contain text, images, audio, video, or other application-specific data.
To allow mail readers to recognise email messages that use MIME, some new MIME specific header fields were
defined. This allows email applications to distinguish between MIME and Non-MIME message so that each can
be appropriately processed.
MIME defines the following new header fields:
(from MIME Overview, by Mark Grand)
-
The MIME-Version header field, which uses a version number to
declare that a message conforms to the MIME standard.
-
The Content-Type header field, which can be used to specify the
type and subtype of data in the body of a message and to fully specify
the encoding of such data. The Content-Type can also have an associated subtype.
-
The Content-Type value Text, which can be used to represent
textual information in a number of character sets and formatted text description
languages in a standardized manner.
-
The Content-Type value Multipart, which can be used to
combine several body parts, possibly of differing types of data, into a
single message.
-
The Content-Type value Application, which can be used
to transmit application data or binary data.
-
The Content-Type value Message, for encapsulating a mail
message.
-
The Content-Type value Image, for transmitting still
image (picture) data.
-
The Content-Type value Audio, for transmitting audio
or voice data.
-
The Content-Type value Video, for transmitting video
or moving image data, possibly with audio as part of the composite video
data format.
-
The Content-Transfer-Encoding header field, that specifies how
the data is encoded to allow it to pass through mail transports having
data or character set limitations. Since SMTP was designed to carry US-ASCII text messages, binary data
such as audio, video, images etc. have to be suitably encoded before they can be transferred. The possible
values for the Content-Transfer-Encoding field are:
- BASE64
- QUOTED-PRINTABLE
- 8BIT
- 7BIT
- BINARY
- x-EncodingName (non-standard encoding)
-
Two header fields that can be used to further identify and describe the
data in a message body: the Content-ID and Content-Description
header fields.
MIME allows messages to contain multiple objects. When multiple objects are in a MIME message,
they are represented in a form called a body part. A body part has a header and a body, so it makes
sense to speak about the body of a body part. Also, body parts can be nested in bodies that contain
one or multiple body parts. The Content-type value Multipart is used to
encapsulate multiple body-parts in a single body. The interested reader can refer to the MIME RFCs or the
MIME Overview. by Mark Grand for technical details
of exactly how MIME works.
A sample Multipart MIME Email message is given below (some lines have been deleted for clarity):
From gaurav@csa.iisc.ernet.in Fri Nov 14 16:04:00 2003
Return-Path: <gaurav@csa.iisc.ernet.in>
Delivered-To: gaurav@csa.iisc.ernet.in
Received: from deimos.csa.iisc.ernet.in (deimos.csa.iisc.ernet.in [144.16.67.57])
by csa.iisc.ernet.in (Postfix) with ESMTP id 7F9B12B992
for <gaurav@csa.iisc.ernet.in>; Fri, 14 Nov 2003 15:45:05 +0530 (IST)
Received: by deimos.csa.iisc.ernet.in (Postfix, from userid 9408)
id EA8F24E659; Fri, 14 Nov 2003 16:03:59 +0530 (IST)
Received: from localhost (localhost [127.0.0.1])
by deimos.csa.iisc.ernet.in (Postfix) with ESMTP id D57994A27C
for <gaurav@csa.iisc.ernet.in>; Fri, 14 Nov 2003 16:03:59 +0530 (IST)
Date: Fri, 14 Nov 2003 16:03:59 +0530 (IST)
From: Gaurav Malhotra <gaurav@csa.iisc.ernet.in>
To: Gaurav Malhotra <gaurav@csa.iisc.ernet.in>
Subject: Testing
Message-ID: <Pine.LNX.4.58.0311141602430.11137@deimos.csa.iisc.ernet.in>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="277887299-1503234992-1068806039=:11137"
Status: O
X-Status:
X-Keywords:
X-UID: 226
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.
--277887299-1503234992-1068806039=:11137
Content-Type: TEXT/PLAIN; charset=US-ASCII
This is a test.
--277887299-1503234992-1068806039=:11137
Content-Type: IMAGE/jpeg; name="911c4s1_10x7.jpg"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.58.0311141603590.11137@deimos.csa.iisc.ernet.in>
Content-Description:
Content-Disposition: attachment; filename="911c4s1_10x7.jpg"
/9j/4AAQSkZJRgABAgAAZABkAAD/7AARRHVja3kAAQAEAAAAPAAA/+4ADkFk
b2JlAGTAAAAAAf/bAIQABgQEBAUEBgUFBgkGBQYJCwgGBggLDAoKCwoKDBAM
DAwMDAwQDA4PEA8ODBMTFBQTExwbGxscHx8fHx8fHx8fHwEHBwcNDA0YEBAY
GhURFRofHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8f
Hx8fHx8fHx8f/8AAEQgDAAQAAwERAAIRAQMRAf/EALEAAQADAAMBAQEAAAAA
*******some lines have been deleted for clarity******
SBPHq9oFkl6gJoASCLUQVKQE0AkCaAKASBPUBNUBJQIJKJQBgK8SCavpAmvD
vAmoCrAmvUA1ANQB9PAoVIFeJR//2Q==
--277887299-1503234992-1068806039=:11137
Content-Type: TEXT/plain; charset=US-ASCII; name="proxies.txt"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.58.0311141603591.11137@deimos.csa.iisc.ernet.in>
Content-Description:
Content-Disposition: attachment; filename="proxies.txt"
ZWNlIDogIDE0NC4xNi42NC40IDogICAgMzEyOA0Kc2VyYzogIDE0NC4xNi43
OS41ODoJMzEyOCAgIG9ubHkgYWZ0ZXIgNiBpbiB0aGUgZXZlbmluZyB0byA5
IGluIG1vcm5pbmcNCmNzYSA6ICAxNDQuMTYuNjcuOCA6CTgwODANCg==
--277887299-1503234992-1068806039=:11137--
In the example given above, the main message body is defined as Multipart/Mixed. It consists of three body-parts. The first body-part contains
plain text. The second part contains a jpeg image file encoded using BASE64 and the third part contains plain text encoded
as BASE64.
Want to learn more?