[Mimedefang] Problems training Bayes with no Message-ID:
David F. Skoll
dfs at roaringpenguin.com
Wed May 7 13:00:09 EDT 2003
On Wed, 7 May 2003, Nels Lindquist wrote:
> The SA Bayes system tracks messages it's already "learned" by storing
> Message-IDs in the bayes_seen DB file.
This is, IMO, a serious flaw. Message-IDs are under the control of
the sender, and SpamAssassin should not rely on them for anything
important. It's much better to use a hash of the message contents
to detect duplicates.
But since we can't change that...
> I currently have a script which parses my quarantine directory and
> copies the spam to a single directory, where it's more efficient to
> call "sa-learn --dir". I could modify it to add a Message-ID header
> as it does. Any suggestions on format? Can I just use the Sendmail
> Queue-ID?
I would construct one that contains more "uniqifying" information,
like:
<timestamp.qid at full.host.name>
Something like:
<1052326659.h47Gj2l0001683 at mail.roaringpenguin.com>
(A valid Message-ID must look like an e-mail address: local.part at domain.part)
Regards,
David.
More information about the MIMEDefang
mailing list