[Mimedefang] Looking for an example of obfuscated HTML
Kevin A. McGrail
kmcgrail at pccc.com
Tue Aug 5 17:36:59 EDT 2003
That's a dastardly piece of spam. Though I think it should be easy to do a
strip html routine and then parse the email for SpamAssassin. In fact,
here's a strip html routine I route for Perl based on a PHP DOC example some
years ago.
Perhaps this is something more for SpamAssassin to implement as a rule (If
it doesn't already ;-) )
There could also be a diff comparison added to it to check that the html and
the plain text are "similar-esque" so that you don't get someone sending
bogus text/plain and spam text/html mime messages.
KAM
sub stripouthtml {
my ($string) = @_;
$string =~ s/<BR>/\r\n/sig;
$string =~ s/<P>/\r\n\r\n/sig;
$string =~ s/<script[^>]*?>.*?<\/script>//sig; # Strip out javascript
$string =~ s/<[\/]+?[^<>]*?>//sig; # Strip out END html tags
$string =~ s/<[!]+?[^<>]*?>//sig; # Strip out COMMENT html
tags
$string =~ s/<*?[^<>]*?>//sig; # Strip out BEGIN html tags
#$string =~ s/([\r\n])[\s]+/$1/g; # Strip out white space
$string =~ s/\&(quot|#34);/"/ig; # Replace html entities
"'s
$string =~ s/&(amp|\#38);/&/ig; # &'s
$string =~ s/&(lt|\#60);/</ig; # <'s
$string =~ s/&(gt|#62);/>/ig; # >'s
$string =~ s/&(nbsp|#160);/ /ig; # non-breakable spaces
$string =~ s/&(iexcl|#161);/chr(161)/eig;
$string =~ s/&(cent|#162);/chr(162)/eig;
$string =~ s/&(pound|#163);/chr(163)/eig;
$string =~ s/&(copy|#169);/chr(169)/eig;
$string =~ s/&#(\d+)\;/chr($1)/eg; # evaluate
return $string;
}
> http://www.roaringpenguin.com/dastardly.html
>
> I didn't come up with this; it was (AFAIK) originally proposed by
> John Graham-Cumming. He has a ton more at http://www.jgc.org/tsc/
More information about the MIMEDefang
mailing list