[Mimedefang] two md_check_against_smtp_server questions

Sun Dec 3 21:40:37 EST 2006

1) what does MD fill in if you leave the $helo argument blank?  Does it 
fill in the hosts own hostname?  try to send a blank?  what?  I have 1 
mimedefang-filter that I deploy on 5 machines... it'd be nice to not 
have to customize this in any way.  If MD doesn't fill in a blank with 
"the right thing", can I make this into a feature request?

2) Has anyone set up a means of caching results?  I don't want to hit my 
back-line servers constantly with these requests.  I would prefer to 
have results cached for, say, 2 hours.  I'm trying to think of a good 
way to do this.

One thought I had was to have each machine have an external database 
where the email address is the key, and it has 2 values: time last 
checked, and account state (ok, unknown, over-quota).  Then I'd process 
it like this:

If (address is cached) && ((now - last_checked) <= cache_life)
    use the cached result

If (address is not cached) || ((now - last_checked) > cache_life)
    if the address is valid (via md_check_against_smtp_server() )
       if the address is an account
          if the account is over quota
             state = over-quota
          else
             state = ok
       else
          state = ok
    else
       state = unknown

    update the cache with the new result and last_checked time.

Anyone have thoughts about good and bad ways to do that?

I could just store it in a hash, but that means each child process will 
check on its own.  That's potentially 30 children * 5 machines * 30,000 
addresses = 4.5 million md_check_against_smtp_server() calls ... which 
doesn't even include the actual SMTP deliveries.

If I cache it in a local database, that's easy and cheap.  I then cut 
that down to 5 machines * 30,000 addresses, or .15 million calls per 2 
hours.  Plus, I can potentially cut it further by having an external 
process that goes through and cleans things up every hour or so (seed 
the database with known good addresses from our account management 
system; do the quota checks so they don't have to be done in real time, 
etc.).  That might significantly cut down the number of calls.  And, if 
I'm really confident about the seeding process, I might even be able to 
omit the md_check_against_smtp_server() calls entirely, because the 
seeding process already told me everything I needed to know.

I could also use an external database server, but then I'm introducing 
points of failure into the process, and shifting "lots of calls to the 
backend server" to "lots of calls to the database server".

I'm sort of leaning toward the "local database" approach, but I've never 
really played with ties and such before.