Multipurpose Internet Mail Extensions

Deficiencies of existing standards

Non-ASCII Text in Mail Messages

WHAT IS MIME?

On the Internet, data is sent as 8 bit bytes. The receiving software collects these bytes and assembles them in the proper order. Now the question becomes, "What do these bytes represent?" Are they text? Are they a picture? Are they a sound? How is it possible to know what they mean? Suppose, in addition to the bytes that contain whatever it is, an additional few bytes are sent along saying what the data is. If this is done, the recieving software or person knows what is in the bunch of bytes that make up the message. This is what MIME does. It tells what is in a message so that the message contents can be used in an appropriate way.

When using the MIME standard, messages can contain the following types:

· Text messages in US-ASCII.

· Character sets other than US-ASCII.

· Multi-media: Image, Audio, and Video messages.

· Multiple objects in a single message.

· Multi-font messages.

· Messages of unlimited length.

· Binary files.

Deficiencies of existing standards

The basic Internet standard for message format, defined in RFC 822, and the Simple Mail Transport Protocol (SMTP), defined in RFC 821, are widely used around the world these days (over 1,000,000 computers).

The main restrictions of these standards are:

· The message may contain only US-ASCII characters

· The maximum line length allowed is 1000 characters

· The message must not be longer than a predefined maximum size

Before MIME was defined the main alternative to Internet mail was the X.400. Message Handling System. Now, MIME and X.400 are competing for world dominance. Our impression is that MIME has a chance of winning the battle, due to the fact that it works.

Technical Overview

Some of the most important innovations in the MIME standard are:

· It explicitly describes the set of allowable Content-types.

· It defines a subtype mechanism for Content-types.

· It defines a new Content-type, "multipart", which can be used to encapsulate several body parts within a single message body.

· It provides for standardized encoding of non-ASCII data.

· It explicitly addresses the issue of non-ASCII character sets.

MIME defines the following new header fields:

1. MIME-Version, which uses a version number to declare that a message conforms to the MIME standard.

2. Content-Type, which can be used to specify the type and subtype of data in the body of a message and to fully specify the encoding of such data. It includes also a subtype option. The seven Content-types specified are:

1. Text - to represent textual information in a number of character sets.

2. Image - for transmitting still image (picture) data.

3. Audio - for transmitting audio or voice data.

4. Video - for transmitting video or moving image data.

5. Message - for encapsulating a mail message.

6. Multipart - to combine several body parts, possibly of different types of data, into a single message.

7. Application - to transmit application data or binary data.

3. Content-Transfer-Encoding, which specifies how the data is encoded to allow it to pass through mail transports having data or character set limitations.

4. Content-ID (optional), which enables labeling bodies, thus allowing one body to reference another.

5. Content-Description (optional), which enables associating descriptive information with a body.

Types

The 7 Content-types defined in MIME are:

1. Text - this is the default type. Used to represent textual information in a number of character sets. See subtypes defined.

2. Image - this type is for transmitting still images. See subtypes defined.

3. Audio- this content type is for transmitting audio or voice data. See subtypes defined.

4. Video - The Video content type is for transmission of video data or moving image data. See subtypes defined.

5. Message - used to encapsulate an entire RFC 822 format messages. See subtypes defined.

6. Multipart - Used to combine several body parts of possibly different types & subtypes. See subtypes defined.

7. Application - Can be used to transmit application data (such as executables) or binary data. See subtypes defined.

Subtypes

The subtype mechanism was defined in order to enable many more types, while leaving the readers a good prospect of recognizing the content-type.

Following are the subtypes defined for each of the main Content-types:

1. Text

o Text/Plain - indicates plain/unformated text, as defined in RFC 822 (which is the default value).

o Text/RichText - indicates a simple portable word processing format that is defined by the MIME standard.

2. Image

o Image/Jpeg - indicates an image in the JPEG format.

o Image/Gif - indicates an image in the GIF format.

3. Audio

o Audio/Basic - denotes a single channel 8000 Hz audio data.

4. Video

o Video/MPEG - video coded according to the MPEG standard.

5. Message

o Message/RFC822 - indicates that the body contains an encapsulated message, with the syntax of RFC 822.

o Message/Partial - indicates that the body contains a fragment of a larger message.
Parameters:

§ ID - unique identifier. Used to match the parts of a message together.

§ Number - indicates which part of a message it is.

§ Total - total number of parts. Required in the final part. In the other parts it is optional.

o Message/External-Body - shows that the actual body data is not included, but only referenced.

6. Multipart

o Multipart/Mixed - used to indicate multiple independent body parts to be viewed serially.

o Multipart/Alternative - Each part is an "alternative" version of the same information.

o Multipart/Parallel - in a parallel body, all the body parts are intended to be present simultaneously on hardware and software that are capable of doing so. Syntactically identical to the Multipart/Mixed.

o Multipart/Digest - each of the body parts is an RFC 822 mail.

7. Application

o Application/Octet-Stream - Indicates uninterpreted binary data.
Parameters:

§ Name - A suggested name for the binary data if stored as a file.

§ Type - The general type or category of the binary data.

§ Padding - The number of padding bits to produce the enclosed byte-oriented data.

o Application/PostScript - Indicates that a body contains a postscript document.

8. X-Typename
Non-standard content-types can be used, but must be given names starting with "X-". This indicates a private type value, to be used by consenting mail systems by mutual agreement. The standard specifies no subtypes. This same mechanism can be used on the subtype portion of the field to specify private subtype values.

Multipart Message

The multipart message is one of the important extensions of RFC 822.

This content-type is to be used to pack several parts, of possibly differing types and subtypes, into a single RFC 822 message body.

The Content-type field specifying type multipart also includes a delimiter (boundary), which is used to separate each consecutive body part. Each body part is itself structured more or less as an RFC 822 message in miniature - in particular, possibly containing its own Content-type field to describe the type of the part.

An expected use of subtypes of multipart is to add further structure to the parts, to permit a more integrated structure of multipart messages among cooperating user agents.

From chaks@csa.iisc.ernet.in Tue Nov 5 10:40:11 2002 Status: RO X-Status: Return-Path: Delivered-To: chaks@kohinoor.csa.iisc.ernet.in Received: from csa.iisc.ernet.in (csa.iisc.ernet.in [144.16.67.8]) by kohinoor.csa.iisc.ernet.in (Postfix) with ESMTP id 76B39E952 for ; Tue, 5 Nov 2002 10:40:10 +0530 (IST) Received: by csa.iisc.ernet.in (Postfix) id 99C2C2C8E0; Tue, 5 Nov 2002 10:41:15 +0530 (IST) Delivered-To: chaks@csa.iisc.ernet.in Received: from kohinoor.csa.iisc.ernet.in (kohinoor.csa.iisc.ernet.in [144.16.67.10]) by csa.iisc.ernet.in (Postfix) with ESMTP id 052F52C8DE for ; Tue, 5 Nov 2002 10:41:15 +0530 (IST) Received: by kohinoor.csa.iisc.ernet.in (Postfix, from userid 9320) id 02CE0E952; Tue, 5 Nov 2002 10:40:08 +0530 (IST) Date: Tue, 5 Nov 2002 10:40:05 +0530 (IST) From: P K Abraham To: K Chakrabarti Subject: ca project paper Message-ID: MIME-Version: 1.0 Content-Type:MULTIPART/MIXED;BOUNDARY="-559023410-851401618-1036473005=:1943" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---559023410-851401618-1036473005=:1943 Content-Type: TEXT/PLAIN; charset=US-ASCII ---559023410-851401618-1036473005=:1943 Content-Type: APPLICATION/octet-stream; name="stchAppl.ps.gz" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description:

(View details of e-mail headers here)

One should note the special following features in his multipart mail message:

· There is a general "Content-Type" for the entire message, of type MULTIPART/MIXED

· The BOUNDARY:

o Is defined in the Content-Typefieldas:BOUNDARY="-559023410-851401618-1036473005=:1943"

o Appears, with two preceding hyphens ("--"), between each two parts of the multipart message (If it exists)

o Appears, with two preceding & two following hyphens ("--"), at the end of the message.(If it exists)

· The text part defines the "charset" as US-ASCII

· The other parts define also a Content-Transfer-Encoding method, which is BASE64.

Content-Transfer-Encoding

Many Content-types are represented as 8-bit character or binary data. Such data cannot be transmitted over some transport protocols.

This header field is used to indicate the type of transformation that has been used for re-encoding such data into 7-bit short-line format, in order to represent the body in an acceptable manner for transport.

The possible values defined for this field are:

1. BASE64

2. Quoted-Printable

3. 8-bit

4. 7-bit

5. Binary

6. X-token

Each encoding method is briefly described as:

1. BASE64

Imply an encoding that consists of lines no longer than 76 ASCII characters. Suited for representing binary files. Any sequence of three bytes is represented as four printable ASCII characters.

2. Quoted-Printable

The encoding scheme implied by this value is most appropriate for data that consists primarily of printable ASCII characters. Using this encoding method, printable ASCII characters are represented as themselves. The encoding consists of lines no longer than 76 ASCII characters.

3. 8-bit

The lines are short, but there may be non-ASCII characters.

4. 7-bit

The data is all represented as short lines of US-ASCII data.

5. Binary

Allows for non-ASCII characters and longer lines (i.e., the lines might not be short enough for SMTP transport).

6. X-token

In order to define private Content-Transfer-Encoding values it is possible to use the special X-token mechanism, in which the private encoding type name is prefixed by "X-" to indicate its non standard status.

Non-ASCII Text in Mail Messages

A technique to allow the encoding of non-ASCII text in various portions of a RFC 822 message header, in a manner which is unlikely to confuse existing message handling software, is described in RFC 1522 - "MIME Part Two: Message Header Extensions for Non-ASCII Text".

The chosen technique is based on certain sequences of "ordinary" printable ASCII characters (known as "encoded-words") being reserved for use as encoded data.

Generally, an "encoded-word" is a sequence of printable ASCII characters that begins with "=?", ends with "?=", and has two "?"s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.

A mail composer that implements the RFC 1522 specification will provide a means of inputting non-ASCII text in header fields, but will translate these fields into encoded-words before inserting them into the message header.

A mail reader that implements the specification will recognize encoded-words when they appear in certain portions of the message header. Instead of displaying the encoded-word "as is", it will reverse the encoding and display the original text in the designated character set.

TOP CONTENTS

Multipurpose Internet Mail Extensions

What is MIME?

Deficiencies of existing standards

Technical Overview

Multipart Message

Content-Transfer-Encoding

Non-ASCII Text in Mail Messages

WHAT IS MIME?

Deficiencies of existing standards

Technical Overview

Types

Subtypes

Multipart Message

Content-Transfer-Encoding

Non-ASCII Text in Mail Messages