UNIX mbox format is used by AsyncOS when messages are archived (in anti-spam and anti-virus configuration) and logged (in the message filter log() action).
Mbox format is an ASCII-formatted (i.e., not binary) file format that can contain zero or more mail messages. Messages are concatenated in the mbox file and can be pried apart based on specific strings in the file. This format is identical with the message as they are transferted between RFC 2821 complained mail gateways.
Each message in mbox format begins with a line beginning with the string "From " (ASCII characters F, r, o, m, and space). "From" lines are followed by several more fields: envelope-sender, date, and (optionally) more-data.
The first field after the "From " string is the envelope-sender of the message. Depending on which application is creating the mbox file, the envelope-sender may be present as a real mailbox, or it may be another character or string. Most commonly, you will find "-" (single character dash) replacing the envelope-sender if the actual envelope-sender is not available or not known. The date field inserted by the ESA is in standard UNIX asctime() format and is always 24 characters in length. In some mbox files written by non-AsyncOS implementations, further information will follow the date stamp. These three fields are separated by a single space.
Here is an example of an mbox file with a single message in it:
From Adam@Outside.COM Sun Oct 17 12:03:20 2004 Received: from mail.outside.com (188.8.131.52) by smtp.alpha.com with ESMTP; 17 Oct 2004 12:03:20 -0700 X-IronPort-AV: i="3.85,147,1094454000"; v="EICAR-AV-Test'0'v"; d="scan'208"; a="86:adNrHT37924848" X-IronPort-RCPT-TO: email@example.com From: Adam@Outside.COM To: Alan Alpha <Alan@mail.example.COM> Subject: Exercise 7a Anti-Virus Scanning Reply-To: Adam Alpha <firstname.lastname@example.org> Date: Sun, 17 Oct 2004 12:02:39 -0700 MIME-version: 1.0 Content-type: multipart/mixed; boundary="IronPort"
When parsing mbox-formatted files, it is desirable not to read too much semantics into the "From " line separating messages. Because many different utilities will write mbox files, there is considerable variation in these lines. However, the "From " line can always be used as a message separator line to reliably indicate that a new message has started in the mbox file. In all, there are about 20 known formats for the strings after the "From " message separator, which makes parsing these in the general case very difficult.
Following the "From " line is an email message in RFC 2822 format, with a series of message body headers followed by a blank line followed by additional message body content.
To ensure that messages are properly separated, lines that begin with the string "From " are always pre-pended by a single ">". Various different variants of mbox files handle lines beginning with ">From " differently. In early implementations of applications that wrote mbox files, these lines were not themselves quoted. AsyncOS log files will always prepend a ">" to lines that begin with one or more ">" characters followed by "From ".
Here is an example of an mbox file containing a message that had lines containing the starting strings "From ", ">From " and ">>>>From " in it:
From email@example.com Sun Dec 12 12:27:33 2004 X-IronPort-RCPT-TO: firstname.lastname@example.org From: email@example.com To: firstname.lastname@example.org Subject: Quote this, if you dare Date: Sun, 12 Dec 2004 12:28:00 -0700
The following line is just From >From A From Line
The following line has quoted >From >>From A >From Line
The following line has many >>>>From >>>>>From This line has 4 > characters before From
And this is the last line
The end of a message in an mbox format file is traditionally signaled by a blank line. However, this is not always present (although AsyncOS does place it there). When parsing an mbox-format file, you should signal the end of a message either by the start of a new message (deleting the blank line if one is present) or by the end of file.
Another variant in the mbox format called for the length of the message to be signaled in a "Content-Length" field within the message header. That format did not use "From " line quoting. AsyncOS does not use this format and does not insert a Content-Length field.