On the
Internet, data is sent as 8 bit bytes. The receiving software collects these
bytes and assembles them in the proper order. Now the question becomes,
"What do these bytes represent?" Are they text? Are they a picture?
Are they a sound? How is it possible to know what they mean? Suppose, in
addition to the bytes that contain whatever it is, an additional few bytes are
sent along saying what the data is. If this is done, the recieving software or
person knows what is in the bunch of bytes that make up the message. This is
what MIME does. It tells what is in a message so that the message contents can
be used in an appropriate way.
When using the
MIME standard, messages can contain the following types:
·
Text messages in
US-ASCII.
·
Character sets other
than US-ASCII.
·
Multi-media: Image,
Audio, and Video messages.
·
Multiple objects in a
single message.
·
Multi-font messages.
·
Messages of unlimited
length.
·
Binary files.
The basic
Internet standard for message format, defined in RFC 822,
and the Simple Mail Transport Protocol (SMTP), defined in RFC 821,
are widely used around the world these days (over 1,000,000 computers).
The main
restrictions of these standards are:
·
The message may
contain only US-ASCII characters
·
The maximum line
length allowed is 1000 characters
·
The message must not
be longer than a predefined maximum size
Before MIME was
defined the main alternative to Internet mail was the X.400. Message Handling
System. Now, MIME and X.400 are competing for world dominance. Our impression
is that MIME has a chance of winning the battle, due to the fact that it works.
Some of the
most important innovations in the MIME standard are:
·
It explicitly
describes the set of allowable Content-types.
·
It defines a subtype
mechanism for Content-types.
·
It defines a new
Content-type, "multipart", which can be used to encapsulate several
body parts within a single message body.
·
It provides for
standardized encoding of non-ASCII data.
·
It explicitly
addresses the issue of non-ASCII character sets.
MIME defines
the following new header fields:
1.
MIME-Version, which uses a version number to declare that a
message conforms to the MIME standard.
2.
Content-Type, which can be used to specify the type and subtype
of data in the body of a message and to fully specify the encoding of such
data. It includes also a subtype option. The seven Content-types specified are:
1.
Text - to represent textual information in a number of
character sets.
2.
Image - for transmitting still image (picture) data.
3.
Audio - for transmitting audio or voice data.
4.
Video - for transmitting video or moving image data.
5.
Message - for encapsulating a mail message.
6.
Multipart - to combine several body parts, possibly of
different types of data, into a single message.
7.
Application - to transmit application data or binary data.
3.
Content-Transfer-Encoding, which specifies how the data is encoded to allow
it to pass through mail transports having data or character set limitations.
4.
Content-ID (optional), which enables labeling bodies, thus
allowing one body to reference another.
5.
Content-Description (optional), which enables associating descriptive
information with a body.
The 7
Content-types defined in MIME are:
1.
Text - this is the default type. Used to represent
textual information in a number of character sets. See subtypes defined.
2.
Image - this type is for transmitting still images. See
subtypes defined.
3.
Audio- this content type is for transmitting audio or
voice data. See subtypes defined.
4.
Video - The Video content type is for transmission of
video data or moving image data. See subtypes defined.
5.
Message - used to encapsulate an entire RFC 822
format messages. See subtypes defined.
6.
Multipart - Used to combine several body parts of possibly
different types & subtypes. See subtypes defined.
7.
Application - Can be used to transmit application data (such as
executables) or binary data. See subtypes defined.
The subtype
mechanism was defined in order to enable many more types, while leaving the
readers a good prospect of recognizing the content-type.
Following are
the subtypes defined for each of the main Content-types:
o
Text/Plain - indicates
plain/unformated text, as defined in RFC 822
(which is the default value).
o
Text/RichText -
indicates a simple portable word processing format that is defined by the MIME
standard.
o
Image/Jpeg - indicates
an image in the JPEG format.
o
Image/Gif - indicates an
image in the GIF format.
o
Audio/Basic - denotes
a single channel 8000 Hz audio data.
o
Video/MPEG - video
coded according to the MPEG standard.
o
Message/RFC822 -
indicates that the body contains an encapsulated message, with the syntax of RFC 822.
o
Message/Partial -
indicates that the body contains a fragment of a larger message.
Parameters:
§
ID - unique
identifier. Used to match the parts of a message together.
§
Number - indicates
which part of a message it is.
§
Total - total number
of parts. Required in the final part. In the other parts it is optional.
o
Message/External-Body
- shows that the actual body data is not included, but only referenced.
o
Multipart/Mixed - used
to indicate multiple independent body parts to be viewed serially.
o
Multipart/Alternative
- Each part is an "alternative" version of the same information.
o
Multipart/Parallel -
in a parallel body, all the body parts are intended to be present
simultaneously on hardware and software that are capable of doing so.
Syntactically identical to the Multipart/Mixed.
o
Multipart/Digest -
each of the body parts is an RFC 822
mail.
o
Application/Octet-Stream
- Indicates uninterpreted binary data.
Parameters:
§
Name - A suggested
name for the binary data if stored as a file.
§
Type - The general
type or category of the binary data.
§
Padding - The number
of padding bits to produce the enclosed byte-oriented data.
o
Application/PostScript
- Indicates that a body contains a postscript document.
8.
X-Typename
Non-standard content-types can be used, but must be given names starting with
"X-". This indicates a private type value, to be used by consenting
mail systems by mutual agreement. The standard specifies no subtypes. This same
mechanism can be used on the subtype portion of the field to specify private
subtype values.
The multipart
message is one of the important extensions of RFC
822.
This
content-type is to be used to pack several parts, of possibly differing types
and subtypes, into a single RFC 822 message body.
The
Content-type field specifying type multipart also includes a delimiter
(boundary), which is used to separate each consecutive body part. Each body
part is itself structured more or less as an RFC 822 message in miniature - in
particular, possibly containing its own Content-type field to describe the type
of the part.
An expected use
of subtypes of multipart is to add further structure to the parts, to permit a
more integrated structure of multipart messages among cooperating user agents.
Many
Content-types are represented as 8-bit character or binary data. Such data
cannot be transmitted over some transport protocols.
This header field
is used to indicate the type of transformation that has been used for
re-encoding such data into 7-bit short-line format, in order to represent the
body in an acceptable manner for transport.
The possible
values defined for this field are:
1.
BASE64
2.
Quoted-Printable
3.
8-bit
4.
7-bit
5.
Binary
6.
X-token
Each encoding
method is briefly described as:
Imply an
encoding that consists of lines no longer than 76 ASCII characters. Suited for
representing binary files. Any sequence of three bytes is represented as four
printable ASCII characters.
The encoding
scheme implied by this value is most appropriate for data that consists
primarily of printable ASCII characters. Using this encoding method, printable
ASCII characters are represented as themselves. The encoding consists of lines
no longer than 76 ASCII characters.
The lines are
short, but there may be non-ASCII characters.
The data is all
represented as short lines of US-ASCII data.
Allows for
non-ASCII characters and longer lines (i.e., the lines might not be short
enough for SMTP transport).
In order to
define private Content-Transfer-Encoding values it is possible to use the special
X-token mechanism, in which the private encoding type name is prefixed by
"X-" to indicate its non standard status.
A technique to
allow the encoding of non-ASCII text in various portions of a RFC 822
message header, in a manner which is unlikely to confuse existing message
handling software, is described in RFC 1522
- "MIME Part Two: Message Header Extensions for Non-ASCII Text".
The chosen
technique is based on certain sequences of "ordinary" printable ASCII
characters (known as "encoded-words")
being reserved for use as encoded data.
Generally, an
"encoded-word" is a sequence of printable ASCII characters that
begins with "=?", ends with "?=", and has two
"?"s in between. It specifies a character set and an encoding method,
and also includes the original text encoded as graphic ASCII characters,
according to the rules for that encoding method.
A mail
composer that implements the RFC 1522 specification will provide a means of
inputting non-ASCII text in header fields, but will translate these fields into
encoded-words before inserting them into the message header.
A mail
reader that implements the specification will recognize encoded-words when
they appear in certain portions of the message header. Instead of displaying
the encoded-word "as is", it will reverse the encoding and display
the original text in the designated character set.