[Mimedefang] Script for categorizing spam by hits?

Renaud PASCAL renaud.pascal at atosorigin.com
Thu Sep 28 08:23:18 EDT 2006


Le mardi 19 juillet 2005 21:49, Johann a écrit :
> 
> Renaud PASCAL wrote:
> > 
> > Well, not exactly the same slices, and not in the same cols/rows order,
> > also, hum, let's say "the beautifying is quite rough" ...
> > but it may give you a start ;-)
> > 
> > # gawk '/spam,/{ v[int( $4/5)]++}; END{for(i in v){print " scores "5*i" to "5*(i+1)", have:" v[i]" hits"} }' FS=, .../yourmaillog
> 
> Thank You! I hadn't really thought about a more detailed analysis than
> what GraphDefang does, but this is really cool. A little css, a little
> php, and viola! http://mail.srar.com/stats/stats.php . Justification for
> tweaking the dang thing daily!

 Return of the spiders :-)

 I recently made a new script for a quick-glance analysis on
what was going on, it is a bit more intricate than the "2005 version"
but it has some niceties :-)

 (requires bash and gawk, probably easily portable to anything
better than csh and Solaris-old-awk ;-)
  (also here implies the tagging of spam and ham but it could be done
eaily with another pre-filter like /,spam,/ /,ham,/ or better :-)

  Help yourself :-)

 There are two parts, the wrapper that select the input file(s),
and the score part that draws a diagram from the inputs.

 Here's the wrapper :
---------
# cat DIAG2
#!/bin/sh
all (){
( (zgrep -E 'Milter change .add.: .*(X-Spam-free: |X-Spam-Score: )' /var/log/maillog.*.gz);(egrep 'Milter change .add.: .*(X-Spam-free: |X-Spam-Score: )' /var/log/maillog) )  | awk '{print $12}' |sort -rn | /home/admin/spamtools/_diagscores2
}
today (){
egrep 'Milter change .add.: .*(X-Spam-free: |X-Spam-Score: )' /var/log/maillog | awk '{print $12}' |sort -rn | _diagscores2
}
[ $# != 0 ] && all
[ $# == 0 ] && today
---------

Here's the diagram tool :
---------
# cat _diagscores2
#!/bin/bash
###
###
###     Try and make a show on antispam score repartition diagram
###     this version uses the real numbers but graph them in a square
###     rooted length to be more precise on repartition, as the scores
###     are now very contrasted (good sign) the measures of high scores
###     were becoming too tiny to notice :-)
###
###     Note: yes, I thought about using logarithmic scaling
###     but it would be too much a compression.)
###
###     V1.0.2  renaud.pascal at atosorigin.com 200609121806 GMT00
###     V1.0.3  renaud.pascal at atosorigin.com 200609121806 GMT00
###             * bug, on max counts that got a root smaller than the width
###               I now mind about inverting the scale to avoid zero
###		  division and ridiculously small width graphs.
###
awk ' \
BEGIN{width=50}
{
        Score=int(0+$1)
        TOT++;
        if(Score < 5) TOTHAM++;
        if(Score >= 5) TOTSPAM++;
        if(Score >= 14) TOT14++;

        if(Score <= -25){vec[-25]++;next}
        if(Score <= -10){vec[-10]++;next}
        if(Score <= -5){vec[-5]++;next}
        if(Score <= 0){vec[0]++;next}
        if(Score >= 30){vec[30]++;next}
        vec[Score]++
}
END{
        for(any in vec){ sqvec[any]=(vec[any])^(.5) }
        printf "\033[1;46m%10s (%8s) |\033[1;35m%s%s\033[0m  (the graph is square rooted for precision)\n", "Scores", "nombre", "", ""
        max=sqvec[0]
        for(i=1;i<31;i++){ if(sqvec[i] > max){max=sqvec[i]} }
        scale=int((max/width)+.5)
        if(width > max) scale=1/(int((width/max)+.5))
        printf "                      |                                   \033[1;35mScale=%-8.2f \033[0m|\n",scale,max
        show_chart_bar(scale,width,"-25 et -",vec[-25],sqvec[-25])
        show_chart_bar(scale,width," -10 -25",vec[-10],sqvec[-10])
        show_chart_bar(scale,width,"  -5 -10",vec[-5],sqvec[-5])
        show_chart_bar(scale,width,"    O -5",vec[0],sqvec[0])
        for(i=1;i<30;i++){ show_chart_bar(scale,width,i,vec[i],sqvec[i]) }
        show_chart_bar(scale,width,"  30 et +",vec[30],sqvec[30])
        print "        Total : "TOT
        print " Dont     < 5 : \033[1;36m"TOTHAM"\033[0m\t"P100(TOTHAM,TOT)
        print "          >=5 : \033[1;31m"TOTSPAM"\033[0m\t"P100(TOTSPAM,TOT)
        print "        >= 14 : \033[1;31m"TOT14"\033[0m\t"P100(TOT14,TOT)
}
function P100(partie,tout)
{
        return (int(10000*partie/tout))/100"p100"
}
function show_chart_bar(scale,width,val,score,sqscore)
{
barf="=================================================="
grid="                                                  "
pval=int( (sqscore + .5)/scale)
lefty=substr(barf, 1, pval)
righty=substr(grid, pval+1, width-pval)
if(pval == 1)   lefty=")"
if(!sqscore)    lefty=substr(grid, 1, pval)
if ((0+val) >= 14)
        printf "\033[1;37m%10s (%8d) |\033[1;31m%s%s\033[0m|\n", val, score, lefty, righty
else    if ((0+val) >= 1)
                printf "\033[1;37m%10s (%8d) |\033[1;36m%s%s\033[0m|\n", val, score, lefty, righty
        else    printf "\033[1;37m%10s (%8d) |\033[1;32m%s%s\033[0m|\n", val, score, lefty, righty
}
' "${@}"
---------

  The results has ANSI colors to put very low scores in green, high scores in red,
other scores in cyan and normal text in white : here a sample output in plaintext :

# ./DIAG2
    Scores (  nombre) |  (the graph is square rooted for precision)
                      |                                   Scale=1.00     |
  -25 et - (     871) |==============================                    |
   -10 -25 (     515) |=======================                           |
    -5 -10 (      68) |========                                          |
      O -5 (    2291) |================================================  |
         1 (     150) |============                                      |
         2 (     162) |=============                                     |
         3 (     151) |============                                      |
         4 (     230) |===============                                   |
         5 (     214) |===============                                   |
         6 (     125) |===========                                       |
         7 (      97) |==========                                        |
         8 (     106) |==========                                        |
         9 (     148) |============                                      |
        10 (     125) |===========                                       |
        11 (     265) |================                                  |
        12 (     118) |===========                                       |
        13 (     124) |===========                                       |
        14 (     104) |==========                                        |
        15 (      54) |=======                                           |
        16 (      68) |========                                          |
        17 (      38) |======                                            |
        18 (      17) |====                                              |
        19 (      21) |=====                                             |
        20 (      13) |====                                              |
        21 (       5) |==                                                |
        22 (       2) |)                                                 |
        23 (       1) |)                                                 |
        24 (       1) |)                                                 |
        25 (       1) |)                                                 |
        26 (       1) |)                                                 |
        27 (       0) |                                                  |
        28 (       1) |)                                                 |
        29 (       0) |                                                  |
   30 et + (       4) |==                                                |
        Total : 6091
 Dont     < 5 : 4438    72.86p100
          >=5 : 1653    27.13p100
        >= 14 : 331     5.43p100



More information about the MIMEDefang mailing list