[Mimedefang] Blocking outbound Office documents

John Rudd john at rudd.cc
Fri Feb 16 15:29:17 EST 2007


Ben Kamen wrote:
> Jonas Eckerman wrote:
>>
>> When Outlook uses Word for editing email the result is bloated, ugly, 
>> idiotic HTML code.
> 
> A long time ago I sent my resume (hand written .HTML) to a VP friend for
> review. He looked at it in Word.. made a couple of minor wording changes
> and sent it back. (in .HTML)
> 
> It went from 7K to 38K or so in size just from the conversion.
> 
> Nice.
> 

Speaking of that...

Does anyone have a "BadHTML to GoodHTML" filter?  One that cleans up all 
of the useless crap that Word and Netscape (and probably other html 
editing apps, I'm sure) throw into an html file when they edit it?

Things like:

1) redundant tags, like the way Word will put a font related tag at the 
start of every block of text,

2) the "hey, look at me, my application once touched this document!" 
type tags that netscape throws into documents,

3) options for removing things like css and javascript crud that might 
be in an html document (or references to external documents that might 
contain them, such as an external style sheet),

4) options for removing dangerous tags like iframe and object tags

Things like that...

(I ask for a completely unrelated reason: we're looking at how to deal 
with HTML documents in our new plone repository, and we don't want to 
have to deal with things like "some documents were edited in Vi, some in 
Word, etc., and we don't want the Word ones to become unwieldy" ... but 
having it also be something that could be used to massage html email 
attachments might be cool too)





More information about the MIMEDefang mailing list