From f at zz.de Mon Apr 10 05:32:46 2023 From: f at zz.de (Florian Lohoff) Date: Mon, 10 Apr 2023 11:32:46 +0200 Subject: [Mimedefang] HTML Mail / Active content filter Message-ID: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> Hi, i'd like to drop/replace HTML attachments/mails which contain active components like javascript/javascript external refs. Basically going through all text/html etc parts. I am unshure whether i'd need to really decode HTML with HTML::Parse or the like to find it or if simple "regex" matching would be sufficient. Currently i am dropping this by spamassassin with custom filters using regex. Has anyone an example for this or experience which HTML perl module is the most stable? And while at it. I tried my luck to do this also with PDF with active content, trying to parse PDF with CAM::PDF (or PDF::API2) to drop PDFs with active content. So if anyone has suggestions here would also be nice. Flo -- Florian Lohoff f at zz.de Any sufficiently advanced technology is indistinguishable from magic. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From giovanni at paclan.it Tue Apr 11 05:49:39 2023 From: giovanni at paclan.it (giovanni at paclan.it) Date: Tue, 11 Apr 2023 11:49:39 +0200 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> Message-ID: On 4/10/23 11:32, Florian Lohoff via MIMEDefang wrote: > > Hi, > i'd like to drop/replace HTML attachments/mails which contain active > components like javascript/javascript external refs. > > > > > > Basically going through all text/html etc parts. I am unshure whether > i'd need to really decode HTML with HTML::Parse or the like to find it > or if simple "regex" matching would be sufficient. Currently i am > dropping this by spamassassin with custom filters using regex. > > Has anyone an example for this or experience which HTML perl module > is the most stable? > it can be done using HTML::Parser, and then running Mail::MIMEDefang::Actions:action_rebuild(). In some cases it can be tricky because html attachments could be base64 encoded. Giovanni > And while at it. I tried my luck to do this also with PDF with active > content, trying to parse PDF with CAM::PDF (or PDF::API2) to drop > PDFs with active content. So if anyone has suggestions here would > also be nice. > > Flo > > > _______________________________________________ > NOTE: If there is a disclaimer or other legal boilerplate in the above > message, it is NULL AND VOID. You may ignore it. > > MIMEDefang mailing list MIMEDefang at lists.mimedefang.org > https://lists.mimedefang.org/mailman/listinfo/mimedefang_lists.mimedefang.org -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 840 bytes Desc: OpenPGP digital signature URL: From f at zz.de Tue Apr 11 06:06:12 2023 From: f at zz.de (Florian Lohoff) Date: Tue, 11 Apr 2023 12:06:12 +0200 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> Message-ID: <20230411100612.wpq3rll43skknoo6@pax.zz.de> On Tue, Apr 11, 2023 at 11:49:39AM +0200, giovanni--- via MIMEDefang wrote: > On 4/10/23 11:32, Florian Lohoff via MIMEDefang wrote: > > > > Hi, > > i'd like to drop/replace HTML attachments/mails which contain active > > components like javascript/javascript external refs. > > > > > > > > > > > > Basically going through all text/html etc parts. I am unshure whether > > i'd need to really decode HTML with HTML::Parse or the like to find it > > or if simple "regex" matching would be sufficient. Currently i am > > dropping this by spamassassin with custom filters using regex. > > > > Has anyone an example for this or experience which HTML perl module > > is the most stable? > > > it can be done using HTML::Parser, and then running Mail::MIMEDefang::Actions:action_rebuild(). > In some cases it can be tricky because html attachments could be base64 encoded. Yeah - A customer of mine got bitten by this (Cleaning up the ransomeware rubble for 3 weeks now. Massive base64 javascript encoded chunk. Chrome 110 sandbox escape.) I rather block the mail or drop the whole attachment/mimepart if any signs of "javascript" From my quick analysis javascript in mails is pretty rare and in 99% of the cases spam/ad stuff. I right now have a simple custom rule in spamassassin scoring the above very high as spam and rejecting it. But for my taste thats tooo simple. I'd rather walk through all individual MIME parts. Flo -- Florian Lohoff f at zz.de Any sufficiently advanced technology is indistinguishable from magic. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From kmcgrail at pccc.com Tue Apr 11 06:53:48 2023 From: kmcgrail at pccc.com (Kevin A. McGrail) Date: Tue, 11 Apr 2023 06:53:48 -0400 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: <20230411100612.wpq3rll43skknoo6@pax.zz.de> References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> <20230411100612.wpq3rll43skknoo6@pax.zz.de> Message-ID: On 4/11/2023 6:06 AM, Florian Lohoff via MIMEDefang wrote: > From my quick analysis javascript in mails is pretty rare and in 99% of > the cases spam/ad stuff. I right now have a simple custom rule in > spamassassin scoring the above very high as spam and rejecting it. But > for my taste thats tooo simple. I'd rather walk through all individual > MIME parts. From my experience, there is a lot of javascript in emails from a lot of name brands.? However, MIMEDefang's origins are based on exactly this type of concept when DFS invented it. There are a LOT of obuscation techniques but there are also real (but very stupid) banks that do things like email html files for instructions to their clients and things. Do you have a sample of the file with the bad HTML and I can see if there are SA rules that hit it too? Regards, KAM From f at zz.de Tue Apr 11 07:34:01 2023 From: f at zz.de (Florian Lohoff) Date: Tue, 11 Apr 2023 13:34:01 +0200 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> <20230411100612.wpq3rll43skknoo6@pax.zz.de> Message-ID: <20230411113400.nbvmmiqcoqxckuuq@pax.zz.de> Hi Kevin, On Tue, Apr 11, 2023 at 06:53:48AM -0400, Kevin A. McGrail via MIMEDefang wrote: > There are a LOT of obuscation techniques but there are also real (but very > stupid) banks that do things like email html files for instructions to their > clients and things. > > Do you have a sample of the file with the bad HTML and I can see if there > are SA rules that hit it too? Normal Spamassassin did not match anything significant - I added these as custom rules: rawbody ZZ_JS_MIME /["']text\/javascript["']/i describe ZZ_JS_MIME Javascript mimetype score ZZ_JS_MIME 4 rawbody ZZ_JS_SCRIPT /\<\s*script\s+.*src\s*=\s*["']\s*(https|http):/i describe ZZ_JS_SCRIPT External javascript score ZZ_JS_SCRIPT 7.0 rawbody ZZ_JS_SCRIPT2 /javascript/i describe ZZ_JS_SCRIPT2 Only javascript string score ZZ_JS_SCRIPT2 0.1 HTML attachment part of the mail started like this. Then it had an image as base64 and a div with hundrets of base64 snipped which - when merged - was a long javascript. So i guess they included jquery for its base64 decoder and the other external script uri to jumpstart decoding and running the JS code. $customersname

KGZ1bmN0aW9uIChtaWR...

eXBlY3RpbmlicmFuY2goZ3JheXd ... [ ... ] Flo -- Florian Lohoff f at zz.de Any sufficiently advanced technology is indistinguishable from magic. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From dianne at skoll.ca Tue Apr 11 07:59:09 2023 From: dianne at skoll.ca (Dianne Skoll) Date: Tue, 11 Apr 2023 07:59:09 -0400 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> Message-ID: <20230411075909.066b0c39@gato.skoll.ca> On Mon, 10 Apr 2023 11:32:46 +0200 Florian Lohoff via MIMEDefang wrote: > i'd like to drop/replace HTML attachments/mails which contain active > components like javascript/javascript external refs. I think you'll find yourself blocking or damaging quite a lot of valid email. I think a better approach is to sanitize HTML parts by removing all tags except for a specific set of allowed tags. You may also want to remove tag attributes except for a specific set of allowed attributes. You could use a Perl module like HTML::Defang or HTML::Restrict or HTML::Scrubber or HTML::Detoxifier or... well, you have many options. :) Pick the one you like best. You probably also want to avoid rebuilding the message unless the HTML sanitizer actually made changes; there's no point in gratuitously creating a new message and possibly breaking signatures if nothing was changed. If you do find HTML mail where the "body" is essentially a document.write call on a function of a whole bunch of base64-encoded content, then yeah... that's probably malicious and can be dropped. Not exactly sure how to detect that, but IMO document.write in an HTML mail is suspicious enough on its own to block. Also, of course, plugging https://mailmunge.org/ :) Can't resist. Regards, Dianne. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From kmcgrail at pccc.com Tue Apr 11 08:04:02 2023 From: kmcgrail at pccc.com (Kevin A. McGrail) Date: Tue, 11 Apr 2023 08:04:02 -0400 Subject: [Mimedefang] [External] Re: HTML Mail / Active content filter In-Reply-To: <20230411113400.nbvmmiqcoqxckuuq@pax.zz.de> References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> <20230411100612.wpq3rll43skknoo6@pax.zz.de> <20230411113400.nbvmmiqcoqxckuuq@pax.zz.de> Message-ID: <7bf6c422-05cb-14b8-7b48-b06d24fff3f7@pccc.com> On 4/11/2023 7:34 AM, Florian Lohoff wrote: > On Tue, Apr 11, 2023 at 06:53:48AM -0400, Kevin A. McGrail via MIMEDefang wrote: >> There are a LOT of obuscation techniques but there are also real (but very >> stupid) banks that do things like email html files for instructions to their >> clients and things. >> >> Do you have a sample of the file with the bad HTML and I can see if there >> are SA rules that hit it too? > Normal Spamassassin did not match anything significant - I added these as custom > rules: I would suggest you look at the KAM Ruleset from https://mcgrail.com and look at the rules based on the MIMEHeader plugin where you could trigger on html files being attached, > HTML attachment part of the mail started like this. Then it had an image > as base64 and a div with hundrets of base64 snipped which - when merged - was > a long javascript. So i guess they included jquery for its base64 > decoder and the other external script uri to jumpstart decoding and > running the JS code. Yeah, definitely using MIMEDefang (or mailmunge) to remove Javascript tags is a good idea if you don't want to outright block html file attachments. Regards, KAM From f at zz.de Tue Apr 11 10:23:53 2023 From: f at zz.de (Florian Lohoff) Date: Tue, 11 Apr 2023 16:23:53 +0200 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: <20230411075909.066b0c39@gato.skoll.ca> References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> <20230411075909.066b0c39@gato.skoll.ca> Message-ID: <20230411142353.qomzhfp7vrmbujnu@pax.zz.de> Hi Dianne, On Tue, Apr 11, 2023 at 07:59:09AM -0400, Dianne Skoll via MIMEDefang wrote: > On Mon, 10 Apr 2023 11:32:46 +0200 > Florian Lohoff via MIMEDefang wrote: > > > i'd like to drop/replace HTML attachments/mails which contain active > > components like javascript/javascript external refs. > > I think you'll find yourself blocking or damaging quite a lot of valid > email. Javascript in emails is sub 0.1% - Its basically not in use. All mails i found in gigabytes of samples have been ads and crude stuff. I couldnt find legitimate mail with javascript. And after 3 Weeks of Downtime the mood is currently to even block all Microsoft Formats (docx, pptx, xlsx and the like) which we do right now. So my biggest concern is Mail with Javascript (Which was the origin) and PDF with active content. > If you do find HTML mail where the "body" is essentially a > document.write call on a function of a whole bunch of base64-encoded > content, then yeah... that's probably malicious and can be dropped. > Not exactly sure how to detect that, but IMO document.write in an HTML > mail is suspicious enough on its own to block. Flo -- Florian Lohoff f at zz.de Any sufficiently advanced technology is indistinguishable from magic. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From dianne at skoll.ca Tue Apr 11 10:38:36 2023 From: dianne at skoll.ca (Dianne Skoll) Date: Tue, 11 Apr 2023 10:38:36 -0400 Subject: [Mimedefang] HTML Mail / Active content filter In-Reply-To: <20230411142353.qomzhfp7vrmbujnu@pax.zz.de> References: <20230410093246.uso76bckwzwj5tm7@pax.zz.de> <20230411075909.066b0c39@gato.skoll.ca> <20230411142353.qomzhfp7vrmbujnu@pax.zz.de> Message-ID: <20230411103836.54836a71@gato.skoll.ca> On Tue, 11 Apr 2023 16:23:53 +0200 Florian Lohoff wrote: > Javascript in emails is sub 0.1% - Its basically not in use. I just checked my inbox. Email notifications from Airbnb use Javascript. So you will definitely block valid (for some interpretation of "valid") email if you block all email with Javascript. However, if you want to do it, then blocking any HTML part with a