[Mimedefang] Suggestions on an HTML sanitize program.

Michael D. Sofka sofkam at rpi.edu
Thu Apr 30 14:14:33 EDT 2009


Kevin A. McGrail wrote:
> Michael,
> 
> Thanks to Joseph Brennan, I use this code at the end of sub filter() to 
> achieve what I believe you want:

Thank you. I guess I wasn't clear enough that the application is not 
Mimedefang.  However:

>            $badtag = $output =~ s/<(iframe|script|object)\b/<no-$1 /igs;

Would fix 90% of the problem.  It still leave other sources of scripts,
such as on the "onload" attribute in an image.  It will also miss 
scripts hidden by character encodings. In the interests of having 
something that is quick and simple, however, I may do exactly the above.

On the other hand, once I'm ready to add that line of code, I may as 
well type, for example:

     my $stripped_html = detoxify($html, disallow => [qw(dynamic)]);

and get an that extra 9.9%, assuming the overhead of parsing the HTML 
isn't too high, and HTML::Detoxify (or HTML::Defang (or 
HTML::StripScripts)) really does what is claimed, and is updated as new 
exploits are discovered.

In the PHP world, there seems to be a new way to slip a script past the 
standard libraries discovered each week.  But, at least the patches keep 
coming.  (Not to pick on PHP, the problem is in HTML and the difficulty 
of actually detecting when a script is present.  PHP does patch the 
problems as they are discovered.)

Mike

-- 
Michael D. Sofka               sofkam at rpi.edu
C&MT Sr. Systems Programmer,   Email, TeX, Epistemology
Rensselaer Polytechnic Institute, Troy, NY.  http://www.rpi.edu/~sofkam/



More information about the MIMEDefang mailing list