[Mimedefang] graphdefang 0.9, --trim, memory usage, other comments

Chris Stromsoe cbs at cts.ucla.edu
Mon Nov 3 15:17:06 EST 2003


I'm attaching a patch to graphdefang.pl and graphdefanglib.pl that
implements most of the changes listed below.  The patch implements an
"average" keyword, which is like summary, but is an average.  It also
graphs when the --trim option is given, rather than exiting, though it
less than elegantly does write out and then re-read the data, rather than
keeping it all in memory.  It also implements several more command line
options, to specify config file location, path to the summary file, and
summary .db file name.

Because it may or may not be useful, I'm also attaching a short perl
script that I use to merge .db files from multiple machines into a single
file for centralized graphing of the same resources.


-Chris

On Tue, 28 Oct 2003, Chris Stromsoe wrote:

> I'm running graphdefang 0.9 to graph a data-set of roughly 800,000 to
> 1,200,000 mails per day, depending on the day.  For the most part, I'm
> pretty happy with it.  I do have some comments and questions.
>
> First the comments.
>
> I've made some modifications to the source and have added an "average"
> keyword that functions similar to summary, but tracks an average over
> time.  If there is interest in integrating upstream, I'd be more than
> happy to take a diff and send it on.
>
> Running with --trim doesn't draw the graphs.  This means that I'm having
> to run graphdefang twice every time I need the graphs built.  After I
> think more about how I want to do it, I'm going to modify it so that
> graphdefang continues graphing after it trims, rather than exiting.  If
> there's interest, I'll forward a patch for that, too.
>
> graphdefang 0.9 can run with one data file or multiple datafiles.  I've
> added the ability to run without any data files and draw graphs based
> only on the .db file.  I'm also going to add the ability to import data
> from multiple .db files.  Right now, I am pre-processing .db files from
> several machines into a unified file and then running graphdefang
> against the unified .db to create a single graph.  I need to cleanly
> integrate the pre-processor into the graphdefang framework.  Again, if
> there is interest, I will forward that patch as well.
>
> I've made several other changes (configurable summary db name, summary
> db path, and config file name).  If there's interest...
>
> And, now the questions.  Well, question.
>
> As others have mentioned, memory usage is a problem.  I'm regularly
> seeing in excess of 1Gb of ram in use when graphdefang runs.  I'm only
> 12 days into my data-set and have no idea how much more will be eaten
> when I get a full month's worth of data to graph.  Any ideas for
> reducing memory usage? One that I've had is to stop loading all of the
> data from the tie() and working off disk.  Which would have the
> unfortunate side affect of slowing down processing.  Anything else?
>
>
>
> -Chris
> _______________________________________________
> MIMEDefang mailing list
> MIMEDefang at lists.roaringpenguin.com
> http://lists.roaringpenguin.com/mailman/listinfo/mimedefang
>
-------------- next part --------------
--- /usr/local/src/antivirus/graphdefang-0.9/graphdefang.pl	Sun Jun 22 15:04:45 2003
+++ ./graphdefang.pl	Tue Oct 28 17:05:27 2003
@@ -25,7 +25,7 @@
 #=============================================================================
 
 use strict;
-use vars qw($MYDIR $OUTPUT_DIR $SUMMARYDB $QUIET $NODB $DATAFILE @DATAFILES @GRAPHS %TZ);
+use vars qw($MYDIR $OUTPUT_DIR $SUMMARYDB $QUIET $NOFILE $NODB $DATAFILE @DATAFILES @GRAPHS %TZ);
 
 # Argument parsing
 use Getopt::Long;
@@ -33,19 +33,29 @@
 
 $QUIET = 0;	# No output
 $NODB = 0;	# Don't use SummaryDB, just produce charts from logfile
+$NOFILE = 0;	# Don't use a logfile, only the summarydb
 my $trim = 0;	# Trim database
 my $nomax = 0;	# Ignore max date/time
 my $help = 0;	# Show help?
 my $man = 0;	# Show bigger help?
 my $file;	# Log file to parse (optional)
-
-GetOptions( 	'quiet'  => \$QUIET,
-		'nodb' 	 => \$NODB,
-		'trim'   => \$trim,
-		'nomax'  => \$nomax,
-		'help|?' => \$help,
-		'man'	 => \$man,
-		'file=s' => \$file ) or pod2usage(2);;
+my $dbpath;	# Path to the db file
+my $summary;	# db file name
+my $cfile;	# config file name
+my $DataSummary;
+
+
+GetOptions( 	'quiet'		=> \$QUIET,
+		'nodb'		=> \$NODB,
+		'nofile'	=> \$NOFILE,
+		'dbpath=s'	=> \$dbpath,
+		'summary=s'	=> \$summary,
+		'conf=s'	=> \$cfile,
+		'trim'		=> \$trim,
+		'nomax'		=> \$nomax,
+		'help|?'	=> \$help,
+		'man'		=> \$man,
+		'file=s'	=> \$file ) or pod2usage(2);;
 
 pod2usage(1) if $help;
 pod2usage(-exitstatus => 0, -verbose => 2) if $man;
@@ -55,7 +65,8 @@
 ($MYDIR) = (File::Basename::dirname($0) =~ /(.*)/);
 
 # Get graph configurations
-require("$MYDIR/graphdefang-config");
+$cfile = "graphdefang-config" if ! defined $cfile;
+require("$MYDIR/$cfile");
 
 # Require the graphdefang library file
 require ("$MYDIR/graphdefanglib.pl");
@@ -64,35 +75,35 @@
 # Path to summary database
 #
 
-$SUMMARYDB = "$MYDIR/SummaryDB.db";
+$dbpath = $MYDIR if ! defined $dbpath;
+$summary = "SummaryDB.db" if ! defined $summary;
+$SUMMARYDB = "$dbpath/$summary";
 
 # Do we do a database trim?
 if ($trim) {
 	print STDERR "Beginning SummaryDB Trim\n" if (!$QUIET);
 	trim_database();
 	print STDERR "Completed SummaryDB Trim\n" if (!$QUIET);
-	exit;
 }
 
 # Did the user specify a file on the command line?
 $DATAFILE = $file if (defined($file));
 
-my %DataSummary;
 
-if ($DATAFILE) {
+if ($DATAFILE || $NOFILE) {
 
 	print STDERR "Processing data file: $DATAFILE\n" if (!$QUIET);
 
 	# Open DATAFILE and Summarize It
 
-	%DataSummary = read_and_summarize_data($DATAFILE, $nomax)
+	$DataSummary = read_and_summarize_data($DATAFILE, $nomax)
         	or die "No valid mimedefang logs in $DATAFILE";
 
 } elsif (@DATAFILES) {
 
 	foreach my $datafile (@DATAFILES) {
 		print STDERR "Processing data file: $datafile\n" if (!$QUIET);
-		%DataSummary = read_and_summarize_data($datafile, $nomax)
+		$DataSummary = read_and_summarize_data($datafile, $nomax)
 			or die "No valid mimedefang logs in $datafile";
 	}
 } else {
@@ -104,7 +115,7 @@
 
 # Draw graphs
 foreach my $settings (@GRAPHS) {
-	graph(\%{$settings}, \%DataSummary);
+	graph(\%{$settings}, $DataSummary);
 }
 
 __END__
@@ -121,8 +132,12 @@
   --help            brief help message
   --man             full documentation
   --quiet           quiet output
+  --nofile          do not use a log file
   --nodb            do not update SummaryDB
   --trim            trim the SummaryDB
+  --dbpath          path to summary files
+  --summary         summary file name
+  --conf            configuration file name
   --nomax           ignore the max date/time in SummaryDB
   --file            optional log file to parse
 
@@ -145,6 +160,10 @@
 
 Do not produce status output from mimedefang.pl.
 
+=item B<--nofile>
+
+Do not use a log file, just parse the summaryDB and draw graphs from it.
+
 =item B<--nodb>
 
 Do not use nor update the SummaryDB, just parse the file and draw graphs from it.
@@ -156,6 +175,18 @@
 2.  daily data older than 1.25x$NUM_DAYS_SUMMARY days
 3.  all but top 25 sender, recipient, value1, value2, subject values
     for all dates prior to the current hour, day, and month..
+
+=item B<--dbpath>
+
+Specify path to the summary db file.
+
+=item B<--summary>
+
+Specify the summary db file name.
+
+=item B<--conf>
+
+Specify the configuration file name.
 
 =item B<--nomax>
 
--- /usr/local/src/antivirus/graphdefang-0.9/graphdefanglib.pl	Thu Oct  9 23:27:29 2003
+++ ./graphdefanglib.pl	Tue Oct 28 17:05:27 2003
@@ -121,7 +121,7 @@
 			if ($entrytime < $nowdeletetime ) {
 			foreach my $event (keys %{$data{$deletetime}{$entrytime}}) {
 				foreach my $type (keys %{$data{$deletetime}{$entrytime}{$event}}) {
-					if ($type ne 'summary') {
+					if ($type ne 'summary' && $type ne 'average') {
 						my %total = ();
 						foreach my $value (keys %{$data{$deletetime}{$entrytime}{$event}{$type}}) {
 							$total{$value} = $data{$deletetime}{$entrytime}{$event}{$type}{$value};
@@ -189,7 +189,7 @@
 }
 
 sub read_and_summarize_data($$) {
-	use vars qw(%event $text $pid %spamd %user_unknown $event $value1 $value2 $sender $recipient $subject $NumEvents $FoundNewRow $unixtime $MaxDBUnixTime);
+	use vars qw(%event $text $pid %spamd %user_unknown $event $value1 $value2 $sender $recipient $subject $average $NumEvents $FoundNewRow $unixtime $MaxDBUnixTime);
         my $fn = shift;
 	my $nomax = shift;
         my %data = ();
@@ -222,7 +222,10 @@
 
 	# Open SummaryDB
 	read_summarydb(\%data, O_RDONLY|O_CREAT) if (!$NODB);
-       
+
+# if there is no file to read, don't read it and don't make a backup of
+# the db file... no reason to
+if (! $NOFILE) {
 	# Open log file 
 	tie *ZZZ, 'File::ReadBackwards', $fn || die("can't open datafile: $!");
 
@@ -265,8 +268,8 @@
 		my $program = $3;
 		$pid = $4;
 		$text = $5;
-	
-		# Parse date string from syslog using any TIMEZONE info from the config file.	
+
+		# Parse date string from syslog using any TIMEZONE info from the config file.
 		if (defined $TZ{$host}) {
 			my $zone = tz2zone($TZ{$host});
 			$unixtime=str2time($datestring,$zone);
@@ -287,6 +290,7 @@
 		$sender = '';
 		$recipient = '';
 		$subject = '';
+		$average = '';
 
 		$NumEvents = 1;
 		$FoundNewRow = 0;
@@ -313,6 +317,12 @@
 				$data{$timesummary}{$summarytime}{$event}{'recipient'}{$recipient}+=$NumEvents 	if ($recipient ne '');
 				$data{$timesummary}{$summarytime}{$event}{'subject'}{$subject}+=$NumEvents     	if ($subject ne '');
 
+$data{$timesummary}{$summarytime}{$event}{'average'} =
+((defined $data{$timesummary}{$summarytime}{$event}{'average'} ?
+          $data{$timesummary}{$summarytime}{$event}{'average'} : 0) * $NumNewLines{$host} + $average) /
+ ($NumNewLines{$host} + 1)
+if ($average ne '');
+
 				# Store the maximum unixtime per timesummary for later reference
 				$data{'maxhosttime'}{$host} = $unixtime 
 					if (!defined($data{'maxhosttime'}{$host}) 
@@ -333,7 +343,8 @@
 			}
 		}
 	}
-        return %data;
+}
+        return \%data;
 }
 
 sub get_all_data_types($) {
@@ -425,7 +436,11 @@
 	}
 
         # Set Grouping Title
-        if ($settings->{grouping} eq 'summary') {
+	if ($settings->{grouping} eq 'average') {
+		$autotitle = $autotitle . " Average";
+		$autotitle = $autotitle . " " . $settings->{average_title} if defined $settings->{average_title};
+
+	} elsif ($settings->{grouping} eq 'summary') {
                 $autotitle = $autotitle . " Total Counts ";
         } elsif ($settings->{grouping} eq 'value1') {
 		if (defined($settings->{value1_title})) {
@@ -573,13 +588,13 @@
 			push @{$settings->{'data_types'}}, $key;
 		}
 	}
+
 	# Summarize totals across time interval
 	for (my $time=$cutofftime; $time<=$currenttime; $time += $settings->{x_axis_num_sec_incr}) {
 		my $date = get_unixtime_by_timesummary($settings->{grouping_time},$time);
 
 		# Get total for summary grouping
 		if ($settings->{'grouping'} eq 'summary') {
-
 			foreach my $datatype (@{$settings->{'data_types'}}) {
 				if (defined($data->{$settings->{grouping_time}}{$date}{$datatype}{'summary'})) {
 					$Total{$datatype} += $data->
@@ -591,7 +606,18 @@
 					$Total{$datatype} += 0;
 				}
 			}
-
+		} elsif ($settings->{'grouping'} eq 'average') {
+			foreach my $datatype (@{$settings->{'data_types'}}) {
+				if (defined($data->{$settings->{grouping_time}}{$date}{$datatype}{'average'})) {
+					$Total{$datatype} = $data->
+							{$settings->{grouping_time}}
+							{$date}
+							{$datatype}
+							{'average'};
+				} else {
+					$Total{$datatype} = 0;
+				}
+			}
 		} else {
 			# Get total for other groupings
 
@@ -653,6 +679,10 @@
 		foreach my $datatype (@{$settings->{'data_types'}}) {
 			push @Legend, "\u$datatype, Total = $Total{$datatype}";
 		}
+	} elsif ($settings->{'grouping'} eq 'average') {
+		foreach my $datatype (@{$settings->{'data_types'}}) {
+			push @Legend, "\u$datatype";
+		}
 	} else {
 		my $i=0;
 		foreach my $TopNName (sort { $Total{'value'}{$b} <=> $Total{'value'}{$a} } keys %{$Total{'value'}} ) {
@@ -684,8 +714,33 @@
 
 		my $i=0;
 		push @{$GraphData[$i]}, $datestring;
-		
-		if ( $settings->{'grouping'} eq 'summary' ) {
+
+		if ( $settings->{'grouping'} eq 'average') {
+			foreach my $datatype (@{$settings->{'data_types'}}) {
+
+			# Data format:
+			#$data{$timesummary}{$summarytime}{$event}{'summary'}++;
+			#$data{$timesummary}{$summarytime}{$event}{'value1'}{$value1}++
+
+				$i++;
+				# Set any undefined values to 0 so GD::Graph
+				# has something to graph
+				if ( defined($data->
+						{$settings->{grouping_time}}
+						{$date}
+						{$datatype}
+						{'average'}) ) {
+					push @{$GraphData[$i]}, $data->
+								{$settings->{grouping_time}}
+								{$date}
+								{$datatype}
+								{'average'};
+				} else {
+					push @{$GraphData[$i]}, 0;
+				}
+			}
+
+		} elsif ( $settings->{'grouping'} eq 'summary') {
 			foreach my $datatype (@{$settings->{'data_types'}}) {
 
 			# Data format:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: grouper.pl
Type: application/x-perl
Size: 1764 bytes
Desc: 
URL: <https://lists.mimedefang.org/pipermail/mimedefang_lists.mimedefang.org/attachments/20031103/13822ae2/attachment.pl>


More information about the MIMEDefang mailing list