[Mimedefang] Help with Unicode Subject (again please)

Jonas Eckerman jonas_lists at frukt.org
Thu Oct 23 15:52:52 EDT 2003


On Thu, 23 Oct 2003 21:20:08 +0200 (SAST), Stefan Schoeman wrote:

> Subject:=?ISO-8859-
> 1?B?UmU6R2V0IFNpbGRlbmFmaWwgQ2l0cmF0ZSAgT25saW5lIENoZWFwISBJbnRlcm5l
> dCBTcGVjaWFsIQ==?=

That stuff uses the pretty common charset ISO-8859-1, aslo known as 
ISO Latin-1. It's probably the most common 8 bit charset nowadays. 
It's also using MIME to specify the charset and encoding.

Usually subjects in 8-bit charsets are encoded with Quoted 
Printables, but in this example my guess is that the B (in 
"?ISO-8859-1?B?) is for Base64. It does look like it is 
Base64-encoded, and Base64 is a valid MIME encoding for subject as 
well as the body.

If you want a more authoritative answer, check whatever RFC specifies 
MIME encoding fo headers to make sure what the B stands for.

> However, because of the encoding or Unicode or whatever that
> stuff above is, my regular expression of ^.*[Vv][Ia][Gg][Rr][Aa].*$

If you want your filters to work on subjects containing international 
characters, you have to support decoding of subjects encoded 
according to the MIME standard. It's not that hard. It's something 
like:

Mail Header	=	<header>: <data>

<header>	=	Some header where MIME encoded text is ok.

<data>	=	=?<charset>?<encoding>?<encoded text>?=

<charset>	=	some MIME supported charset. ISO-8859-1 is the most 
common, but far from the only one.

<encoding>	=	A code that specifies a MIME supported encoding. The 
ones I know are allowed are Quoted Printables and Base64. There might 
be others.

Again, find and read the appropriate RFC for more accurate and 
authoritative info.

Regards
/Jonas

-- 
Jonas Eckerman, jonas_lists at frukt.org
http://www.fsdb.org/





More information about the MIMEDefang mailing list