[Mimedefang] Looking for an example of obfuscated HTML

Kevin A. McGrail kmcgrail at pccc.com
Tue Aug 5 17:36:59 EDT 2003


That's a dastardly piece of spam.  Though I think it should be easy to do a
strip html routine and then parse the email for SpamAssassin.  In fact,
here's a strip html routine I route for Perl based on a PHP DOC example some
years ago.

Perhaps this is something more for SpamAssassin to implement as a rule (If
it doesn't already ;-) )

There could also be a diff comparison added to it to check that the html and
the plain text are "similar-esque" so that you don't get someone sending
bogus text/plain and spam text/html mime messages.

KAM

sub stripouthtml {
  my ($string) = @_;
  $string =~ s/<BR>/\r\n/sig;
  $string =~ s/<P>/\r\n\r\n/sig;
  $string =~ s/<script[^>]*?>.*?<\/script>//sig; # Strip out javascript
  $string =~ s/<[\/]+?[^<>]*?>//sig;             # Strip out END html tags
  $string =~ s/<[!]+?[^<>]*?>//sig;              # Strip out COMMENT html
tags
  $string =~ s/<*?[^<>]*?>//sig;                 # Strip out BEGIN html tags
  #$string =~ s/([\r\n])[\s]+/$1/g;               # Strip out white space
  $string =~ s/\&(quot|#34);/"/ig;               # Replace html entities
"'s
  $string =~ s/&(amp|\#38);/&/ig;                # &'s
  $string =~ s/&(lt|\#60);/</ig;                 # <'s
  $string =~ s/&(gt|#62);/>/ig;                  # >'s
  $string =~ s/&(nbsp|#160);/ /ig;               # non-breakable spaces
  $string =~ s/&(iexcl|#161);/chr(161)/eig;
  $string =~ s/&(cent|#162);/chr(162)/eig;
  $string =~ s/&(pound|#163);/chr(163)/eig;
  $string =~ s/&(copy|#169);/chr(169)/eig;
  $string =~ s/&#(\d+)\;/chr($1)/eg;             # evaluate

  return $string;

}

> http://www.roaringpenguin.com/dastardly.html
>
> I didn't come up with this; it was (AFAIK) originally proposed by
> John Graham-Cumming.  He has a ton more at http://www.jgc.org/tsc/




More information about the MIMEDefang mailing list