[Mimedefang] Help with Unicode Subject (again please)
Jonas Eckerman
jonas_lists at frukt.org
Thu Oct 23 15:52:52 EDT 2003
On Thu, 23 Oct 2003 21:20:08 +0200 (SAST), Stefan Schoeman wrote:
> Subject:=?ISO-8859-
> 1?B?UmU6R2V0IFNpbGRlbmFmaWwgQ2l0cmF0ZSAgT25saW5lIENoZWFwISBJbnRlcm5l
> dCBTcGVjaWFsIQ==?=
That stuff uses the pretty common charset ISO-8859-1, aslo known as
ISO Latin-1. It's probably the most common 8 bit charset nowadays.
It's also using MIME to specify the charset and encoding.
Usually subjects in 8-bit charsets are encoded with Quoted
Printables, but in this example my guess is that the B (in
"?ISO-8859-1?B?) is for Base64. It does look like it is
Base64-encoded, and Base64 is a valid MIME encoding for subject as
well as the body.
If you want a more authoritative answer, check whatever RFC specifies
MIME encoding fo headers to make sure what the B stands for.
> However, because of the encoding or Unicode or whatever that
> stuff above is, my regular expression of ^.*[Vv][Ia][Gg][Rr][Aa].*$
If you want your filters to work on subjects containing international
characters, you have to support decoding of subjects encoded
according to the MIME standard. It's not that hard. It's something
like:
Mail Header = <header>: <data>
<header> = Some header where MIME encoded text is ok.
<data> = =?<charset>?<encoding>?<encoded text>?=
<charset> = some MIME supported charset. ISO-8859-1 is the most
common, but far from the only one.
<encoding> = A code that specifies a MIME supported encoding. The
ones I know are allowed are Quoted Printables and Base64. There might
be others.
Again, find and read the appropriate RFC for more accurate and
authoritative info.
Regards
/Jonas
--
Jonas Eckerman, jonas_lists at frukt.org
http://www.fsdb.org/
More information about the MIMEDefang
mailing list