[Mimedefang] Message sizes (was: Making an md5 key of the body)

Jan-Pieter Cornet johnpc at xs4all.nl
Wed Mar 28 18:30:02 EDT 2007


On Wed, Mar 28, 2007 at 09:48:07AM -0500, Chris Myers wrote:
> The average message size is about 6-10KB (getting larger due to image 
> spam)

Is it? How are you calculating that, and what's the maximum size
your mail server accepts?

I am seeing vastly different numbers... You actually made me recalculate
the average mail size again :) last time I did that was more than a year
ago, and then I saw 30K incoming 100K outgoing, on average.

I'm now seeing an average of 150K on our outgoing server, and an average
of 70K on our incoming server. This difference is mainly because of the
amount of spam on the incoming server, which exceeds 80%. Spam on the
outgoing server is not 0% probably around 3-5% (real outgoing spam,
but mostly due to spam forwarded from end-user systems to external
adresses - not technically "spam" but "having a spammish nature").

Here are the detailed stats on yesterday's mail:

Outgoing:

Average size   : 154159.08 (+- 1352073.80)
Number of mails: 2137525
Range          : 1 .. 262742680
< 10           : 0.0% (14 mails)
< 100          : 0.0% (96 mails)
< 1000         : 6.7% (142758 mails)
< 10000        : 52.9% (1129809 mails)
< 100000       : 31.2% (667423 mails)
< 1000000      : 6.4% (136857 mails)
< 10000000     : 2.6% (56364 mails)
< 100000000    : 0.2% (4185 mails)
< 1000000000   : 0.0% (19 mails)

Incoming:

Average size   : 71627.36 (+- 605008.58)
Number of mails: 4958206
Range          : 1 .. 598019508
< 10           : 0.0% (18 mails)
< 100          : 0.0% (199 mails)
< 1000         : 3.7% (183060 mails)
< 10000        : 51.2% (2537274 mails)
< 100000       : 40.8% (2021056 mails)
< 1000000      : 2.9% (142017 mails)
< 10000000     : 1.5% (72338 mails)
< 100000000    : 0.0% (2243 mails)
< 1000000000   : 0.0% (1 mails)

Note that the large std deviation (what's behind the +-) is an
indication that the average isn't very accurate. The histogram
also shows that.

Hmm, if I calculate the average over the logarithm of the size,
for example the outgoing server gives me: 8908.69 (+- 6.05).
That is 10 ** avg( 10log(mailsize) ) +- std dev calculated over
10logs. That is a lot closer to your observed average of 6-10K.

-- 
Jan-Pieter Cornet <johnpc at xs4all.nl>
!! Disclamer: The addressee of this email is not the intended recipient. !!
!! This is only a test of the echelon and data retention systems. Please !!
!! archive this message indefinitely to allow verification of the logs.  !!



More information about the MIMEDefang mailing list