[Mimedefang] How to parse pdf files or pass them to spamassassin

Dianne Skoll dfs at roaringpenguin.com
Fri May 29 10:02:52 EDT 2015


On Fri, 29 May 2015 15:38:33 +0200
Benoit Panizzon <benoit.panizzon at imp.ch> wrote:

> => Extract text from PDF and pass it to spamassassin to match
> blacklisted URI's within the PDF.

There is a program called pdftotext, which on Debian systems is part
of the poppler-utils package.  I'm sure it's packaged in most Linux distros.

So I'm thinking you could run the PDF through that, add a text/plain part
to INPUTMSG with MIME::tools and pass that to SpamAssassin.  You wouldn't
actually modify the original message; just temporarily add the text/plain
part.  Something like this:

1) Convert PDFs to text and add them as attachment with MIME::tools
   methods.

2) Rename ./INPUTMSG to ./INPUTMSG.ORIG

3) Write out the modified message to ./INPUTMSG

4) Call SpamAssassin

5) Rename ./INPUTMSG.ORIG to ./INPUTMSG

I haven't tried this, but it seems that it should work.

Regards,

Dianne



More information about the MIMEDefang mailing list