[Mimedefang] graphdefang cores with large amounts of data

John Kirkland jpk at bl.org
Wed Oct 15 19:03:19 EDT 2003


Hey, everybody,

I guess it's about time for me to pipe up here.  You should download the
newest version of graphdefang -- version 0.9.  Use the "trim" option to
purge older data from your SummaryDB file.  There is a bug in previous
versions that caused trim to not do all it could when trimming out data.
My SummaryDB went down by 10x in size.

For those of you processing millions of messages... ummm...  I didn't
design this thing with that in mind.  There is quite a bit of data
replication (email addresses, especially) in the existing database.  I've
started working on a relational DB model using an embedded DB engine
(SQLite).  It would be very easy to use another relational DB (mysql,
oracle, etc) once this is in place, but my goal is for a user to be able
to download graphdefang and just use it without having to also install and
configure a database.

Other comments on this thread:

RE:  File::ReadBackwards... This is not causing the performance issues.
It opens a Handle to a file, fast forwards the handle to the very end of
the file and then backs up several K at a time splitting the lines on
"\n".  It's fast, efficient, and pretty neat.

RE:  Memory usage of graphdefang.  Yes, it uses alot!  It uses alot while
parsing and alot while creating the graphs.  I've looked here and there,
but it's not obvious to me how to make it not do this... other than
cleaning up the data model.

I'm gonna read through the rest of the thread and see if there are any
other topics to comment on...

Best Regards,
John

On Wed, 15 Oct 2003, cam wrote:

> On Tue, 14 Oct 2003, Patrick Morris wrote:
>
> > Happens to me constantly.  I'm lucky if I can keep a week's worth of
> > data without losing my SummaryDB file, and I've only got 500 or so users
> > (though, admittedly, a much higher than average mail and spam volume
> > than the average 500-user site).
>
>
>
> i had this happen to me last week sometime (don't remember off hand exact
> day without looking).  processing time had gotten up to about 10 to 12
> minutes per run (running once per hour) even after trimming down the
> number of graphs built.  the db was ~500 Meg at this point.  even with 2g
> ram the box was taking a serious performance hit for  a while due to
> swapping to hand the 1.6g to 1.8g process.  /sigh
>
> cam
>
>
>
> ------------------------------
>



More information about the MIMEDefang mailing list