[Mimedefang] Problems training Bayes with no Message-ID:
Nels Lindquist
nlindq at maei.ca
Wed May 7 12:45:00 EDT 2003
I've run into a bit of an annoyance while training SA's Bayesian
classifier on my quarantined spam directory.
The problem occurs for messages which don't have a Message-ID: header
added by a previous relay. Since MIMEDefang's
action_quarantine_entire_message() subroutine preserves the state of
the message at the time of milter processing, Sendmail hasn't yet
added its own Message-ID: header.
The SA Bayes system tracks messages it's already "learned" by storing
Message-IDs in the bayes_seen DB file. With no Message-ID, there's
no way to determine if the message was already seen, and such
messages aren't dealt with consistently.
I don't know if it's worth adding a Message-ID during the quarantine
process, since this would only be of value to those using sa-learn in
similar circumstances.
I currently have a script which parses my quarantine directory and
copies the spam to a single directory, where it's more efficient to
call "sa-learn --dir". I could modify it to add a Message-ID header
as it does. Any suggestions on format? Can I just use the Sendmail
Queue-ID?
----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.
More information about the MIMEDefang
mailing list