[Mimedefang] Problems training Bayes with no Message-ID:

David F. Skoll dfs at roaringpenguin.com
Wed May 7 13:00:09 EDT 2003


On Wed, 7 May 2003, Nels Lindquist wrote:

> The SA Bayes system tracks messages it's already "learned" by storing
> Message-IDs in the bayes_seen  DB file.

This is, IMO, a serious flaw.  Message-IDs are under the control of
the sender, and SpamAssassin should not rely on them for anything
important.  It's much better to use a hash of the message contents
to detect duplicates.

But since we can't change that...

> I currently have a script which parses my quarantine directory and
> copies the spam to a single directory, where it's more efficient to
> call "sa-learn --dir".  I could modify it to add a Message-ID header
> as it does.  Any suggestions on format?  Can I just use the Sendmail
> Queue-ID?

I would construct one that contains more "uniqifying" information,
like:

	<timestamp.qid at full.host.name>

Something like:

	<1052326659.h47Gj2l0001683 at mail.roaringpenguin.com>

(A valid Message-ID must look like an e-mail address: local.part at domain.part)

Regards,

David.



More information about the MIMEDefang mailing list