[Mimedefang] Mail::SpamAssassin::BayesStore::SQL databasepersistence.

Matthew Schumacher matt.s at aptalaska.net
Wed Jul 20 12:25:54 EDT 2005


David F. Skoll wrote:

> I should also note that there are many problems with the code,
> especially because it makes many assumptions about MySQL-like
> optimizations.  For example, the code to check if "too many" tokens
> will be deleted by an expire run is just plain stupid.
> The query to check if it's OK to do the expiry takes about as long as
> the actual expiry itself!
> 
> The private _get_oldest_token_age function in SQL.pm will also
> be dreadfully slow because it does a sequential scan.
> 

One thing I did to make this a little better is I found everywhere in
the code where min() and max() are used and created an index on the
column used.  In the case of _get_oldest_token_age simply creating an
index on the atime column made this much difference:

                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=220682.04..220682.04 rows=1 width=4) (actual
time=12267.656..12267.657 rows=1 loops=1)
   ->  Seq Scan on bayes_token  (cost=0.00..220542.42 rows=55844
width=4) (actual time=12267.645..12267.645 rows=0 loops=1)
         Filter: (id = 1)
 Total runtime: 12267.804 ms

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17947.15..17947.15 rows=1 width=4) (actual
time=0.130..0.131 rows=1 loops=1)
   ->  Index Scan using bayes_token_pkey on bayes_token
(cost=0.00..17936.02 rows=4451 width=4) (actual time=0.121..0.121 rows=0
loops=1)
         Index Cond: (id = 1)
 Total runtime: 0.361 ms
(4 rows)

I looked at the other queries and they aren't to bad, so I think I have
the biggest problems pretty much solved except for database persistence.
 At this point it is holding up to my mail load, so I'm on the fence
with going back to bayes and having locking problems or just sticking
with this.

schu



More information about the MIMEDefang mailing list