[Mimedefang] Regexp?

David F. Skoll dfs at roaringpenguin.com
Thu Dec 13 21:38:49 EST 2001


On Thu, 13 Dec 2001, Ashley M. Kirchner wrote:

>     Can MDefang understand a regexp like this:  (sorry for wrapping, if
> any)
>     ext = <huge_expression_omitted>

Well, MIMEDefang's filter language is just Perl, so anything Perl can
take, so can MIMEDefang.

But can a *human* understand an expression like that?  I don't think so...
I would simply have an array of smaller regexps to compare against.

>     ^content-type:${ws}(multipart/(signed|encrypted))|(application/)

The content-type for each part is available in the $type argument
to filter().  So just match $type against your regexp.

> ^content-disposition:${ws}attachment;${ws}.*name${ws}=${ws}${dq}.*\.${ext}(\..*)?${dq}${ws}$

All the MIME headers are available by querying $entity->head; see the
MIME::Tools man page.

>     And last, how about scanning the BODY of the message:

>     \<(!doctype|[sp]?h(tml|ead)|title|body)

The body of each part is available in the file $entity->bodyhandle->bodypath;
you can open() that file and read it if you like.

The (un-decoded MIME) body of the entire message is available in the
file ./INPUTMSG; you can open() that file and read it if you like.

> \<(app|bgsound|div|embed|form|i?l(ayer|ink)|img|i?frame(set)?|meta|object|s(cript|tyle))

This is a fool's errand; you can probably split html tags like this:

	<
	 app ..
        >

and writing regexps to match all the possibilities will drive you nuts.
If you really want to defang HTML parts, you need a real HTML parser.

Regards,

David.





More information about the MIMEDefang mailing list