[Mimedefang] Problems training Bayes with no Message-ID:

Nels Lindquist nlindq at maei.ca
Wed May 7 12:45:00 EDT 2003


I've run into a bit of an annoyance while training SA's Bayesian 
classifier on my quarantined spam directory.

The problem occurs for messages which don't have a Message-ID: header 
added by a previous relay.  Since MIMEDefang's 
action_quarantine_entire_message() subroutine preserves the state of 
the message at the time of milter processing, Sendmail hasn't yet 
added its own Message-ID: header.

The SA Bayes system tracks messages it's already "learned" by storing 
Message-IDs in the bayes_seen  DB file.  With no Message-ID, there's 
no way to determine if the message was already seen, and such 
messages aren't dealt with consistently.

I don't know if it's worth adding a Message-ID during the quarantine 
process, since this would only be of value to those using sa-learn in 
similar circumstances.

I currently have a script which parses my quarantine directory and 
copies the spam to a single directory, where it's more efficient to 
call "sa-learn --dir".  I could modify it to add a Message-ID header 
as it does.  Any suggestions on format?  Can I just use the Sendmail 
Queue-ID?

----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.




More information about the MIMEDefang mailing list