[Mimedefang] Regexp?
David F. Skoll
dfs at roaringpenguin.com
Thu Dec 13 21:38:49 EST 2001
On Thu, 13 Dec 2001, Ashley M. Kirchner wrote:
> Can MDefang understand a regexp like this: (sorry for wrapping, if
> any)
> ext = <huge_expression_omitted>
Well, MIMEDefang's filter language is just Perl, so anything Perl can
take, so can MIMEDefang.
But can a *human* understand an expression like that? I don't think so...
I would simply have an array of smaller regexps to compare against.
> ^content-type:${ws}(multipart/(signed|encrypted))|(application/)
The content-type for each part is available in the $type argument
to filter(). So just match $type against your regexp.
> ^content-disposition:${ws}attachment;${ws}.*name${ws}=${ws}${dq}.*\.${ext}(\..*)?${dq}${ws}$
All the MIME headers are available by querying $entity->head; see the
MIME::Tools man page.
> And last, how about scanning the BODY of the message:
> \<(!doctype|[sp]?h(tml|ead)|title|body)
The body of each part is available in the file $entity->bodyhandle->bodypath;
you can open() that file and read it if you like.
The (un-decoded MIME) body of the entire message is available in the
file ./INPUTMSG; you can open() that file and read it if you like.
> \<(app|bgsound|div|embed|form|i?l(ayer|ink)|img|i?frame(set)?|meta|object|s(cript|tyle))
This is a fool's errand; you can probably split html tags like this:
<
app ..
>
and writing regexps to match all the possibilities will drive you nuts.
If you really want to defang HTML parts, you need a real HTML parser.
Regards,
David.
More information about the MIMEDefang
mailing list