[Mimedefang] Script for categorizing spam by hits?
Renaud PASCAL
renaud.pascal at atosorigin.com
Thu Sep 28 08:23:18 EDT 2006
Le mardi 19 juillet 2005 21:49, Johann a écrit :
>
> Renaud PASCAL wrote:
> >
> > Well, not exactly the same slices, and not in the same cols/rows order,
> > also, hum, let's say "the beautifying is quite rough" ...
> > but it may give you a start ;-)
> >
> > # gawk '/spam,/{ v[int( $4/5)]++}; END{for(i in v){print " scores "5*i" to "5*(i+1)", have:" v[i]" hits"} }' FS=, .../yourmaillog
>
> Thank You! I hadn't really thought about a more detailed analysis than
> what GraphDefang does, but this is really cool. A little css, a little
> php, and viola! http://mail.srar.com/stats/stats.php . Justification for
> tweaking the dang thing daily!
Return of the spiders :-)
I recently made a new script for a quick-glance analysis on
what was going on, it is a bit more intricate than the "2005 version"
but it has some niceties :-)
(requires bash and gawk, probably easily portable to anything
better than csh and Solaris-old-awk ;-)
(also here implies the tagging of spam and ham but it could be done
eaily with another pre-filter like /,spam,/ /,ham,/ or better :-)
Help yourself :-)
There are two parts, the wrapper that select the input file(s),
and the score part that draws a diagram from the inputs.
Here's the wrapper :
---------
# cat DIAG2
#!/bin/sh
all (){
( (zgrep -E 'Milter change .add.: .*(X-Spam-free: |X-Spam-Score: )' /var/log/maillog.*.gz);(egrep 'Milter change .add.: .*(X-Spam-free: |X-Spam-Score: )' /var/log/maillog) ) | awk '{print $12}' |sort -rn | /home/admin/spamtools/_diagscores2
}
today (){
egrep 'Milter change .add.: .*(X-Spam-free: |X-Spam-Score: )' /var/log/maillog | awk '{print $12}' |sort -rn | _diagscores2
}
[ $# != 0 ] && all
[ $# == 0 ] && today
---------
Here's the diagram tool :
---------
# cat _diagscores2
#!/bin/bash
###
###
### Try and make a show on antispam score repartition diagram
### this version uses the real numbers but graph them in a square
### rooted length to be more precise on repartition, as the scores
### are now very contrasted (good sign) the measures of high scores
### were becoming too tiny to notice :-)
###
### Note: yes, I thought about using logarithmic scaling
### but it would be too much a compression.)
###
### V1.0.2 renaud.pascal at atosorigin.com 200609121806 GMT00
### V1.0.3 renaud.pascal at atosorigin.com 200609121806 GMT00
### * bug, on max counts that got a root smaller than the width
### I now mind about inverting the scale to avoid zero
### division and ridiculously small width graphs.
###
awk ' \
BEGIN{width=50}
{
Score=int(0+$1)
TOT++;
if(Score < 5) TOTHAM++;
if(Score >= 5) TOTSPAM++;
if(Score >= 14) TOT14++;
if(Score <= -25){vec[-25]++;next}
if(Score <= -10){vec[-10]++;next}
if(Score <= -5){vec[-5]++;next}
if(Score <= 0){vec[0]++;next}
if(Score >= 30){vec[30]++;next}
vec[Score]++
}
END{
for(any in vec){ sqvec[any]=(vec[any])^(.5) }
printf "\033[1;46m%10s (%8s) |\033[1;35m%s%s\033[0m (the graph is square rooted for precision)\n", "Scores", "nombre", "", ""
max=sqvec[0]
for(i=1;i<31;i++){ if(sqvec[i] > max){max=sqvec[i]} }
scale=int((max/width)+.5)
if(width > max) scale=1/(int((width/max)+.5))
printf " | \033[1;35mScale=%-8.2f \033[0m|\n",scale,max
show_chart_bar(scale,width,"-25 et -",vec[-25],sqvec[-25])
show_chart_bar(scale,width," -10 -25",vec[-10],sqvec[-10])
show_chart_bar(scale,width," -5 -10",vec[-5],sqvec[-5])
show_chart_bar(scale,width," O -5",vec[0],sqvec[0])
for(i=1;i<30;i++){ show_chart_bar(scale,width,i,vec[i],sqvec[i]) }
show_chart_bar(scale,width," 30 et +",vec[30],sqvec[30])
print " Total : "TOT
print " Dont < 5 : \033[1;36m"TOTHAM"\033[0m\t"P100(TOTHAM,TOT)
print " >=5 : \033[1;31m"TOTSPAM"\033[0m\t"P100(TOTSPAM,TOT)
print " >= 14 : \033[1;31m"TOT14"\033[0m\t"P100(TOT14,TOT)
}
function P100(partie,tout)
{
return (int(10000*partie/tout))/100"p100"
}
function show_chart_bar(scale,width,val,score,sqscore)
{
barf="=================================================="
grid=" "
pval=int( (sqscore + .5)/scale)
lefty=substr(barf, 1, pval)
righty=substr(grid, pval+1, width-pval)
if(pval == 1) lefty=")"
if(!sqscore) lefty=substr(grid, 1, pval)
if ((0+val) >= 14)
printf "\033[1;37m%10s (%8d) |\033[1;31m%s%s\033[0m|\n", val, score, lefty, righty
else if ((0+val) >= 1)
printf "\033[1;37m%10s (%8d) |\033[1;36m%s%s\033[0m|\n", val, score, lefty, righty
else printf "\033[1;37m%10s (%8d) |\033[1;32m%s%s\033[0m|\n", val, score, lefty, righty
}
' "${@}"
---------
The results has ANSI colors to put very low scores in green, high scores in red,
other scores in cyan and normal text in white : here a sample output in plaintext :
# ./DIAG2
Scores ( nombre) | (the graph is square rooted for precision)
| Scale=1.00 |
-25 et - ( 871) |============================== |
-10 -25 ( 515) |======================= |
-5 -10 ( 68) |======== |
O -5 ( 2291) |================================================ |
1 ( 150) |============ |
2 ( 162) |============= |
3 ( 151) |============ |
4 ( 230) |=============== |
5 ( 214) |=============== |
6 ( 125) |=========== |
7 ( 97) |========== |
8 ( 106) |========== |
9 ( 148) |============ |
10 ( 125) |=========== |
11 ( 265) |================ |
12 ( 118) |=========== |
13 ( 124) |=========== |
14 ( 104) |========== |
15 ( 54) |======= |
16 ( 68) |======== |
17 ( 38) |====== |
18 ( 17) |==== |
19 ( 21) |===== |
20 ( 13) |==== |
21 ( 5) |== |
22 ( 2) |) |
23 ( 1) |) |
24 ( 1) |) |
25 ( 1) |) |
26 ( 1) |) |
27 ( 0) | |
28 ( 1) |) |
29 ( 0) | |
30 et + ( 4) |== |
Total : 6091
Dont < 5 : 4438 72.86p100
>=5 : 1653 27.13p100
>= 14 : 331 5.43p100
More information about the MIMEDefang
mailing list