Forum

 
Current User: Guest Login Register
Please consider registering


Register? | Lost Your Password?

Junk Analysis Detail

Add a New Topic Reply to Post
UserPost

7:11 am
March 26, 2010


Pollik

Guest

I am really liking Junquilla – till now, I have always struggled with Thunderbird's anti spam.  

I am puzzled by one issue that can contribute towards false positives.  Looking at the Junk Analysis Detail, the receiving emailing is always allocated a high token value:

to:<myaddress@mydomain.net>          Token 94%

Such a high allocation seems rather odd to me – a large proportion of the emails that I get are addressed to me.Confused  


Polly

8:58 am
March 26, 2010


rkent

Admin

posts 140

There are lots of oddities in the tokens if you really take the time to examine them. I can't recall a case though where I examined the "why" of a particular token and decided that there was a true bug in the software. Most are training issues.

What that token is telling you is that, of the messages that you have trained, a much higher proportion of those that were sent to: were junk than good. Even if "a large proportion of the emails that I get (and want)" are like that, an even LARGER portion of those that you get (and don't want) are like that.

Now that could be real, or it could be an issue of inadequate training. For training, you should have about the same number of junk as good messages trained. If it is real though, then the junk analyzer is biased toward marking as "junk", and it should be. Tinkering with that one token is not likely to improve the overall performance of the filter. Look at it this way: a message with no real content, but addressed to you, has a 94% chance of being junk. Think something like image spam here.

On my own system, I am now experimentally using a filter (from FiltaQuilla) to automatically train messages as good that match my address book. I need though to add the capability to limit the number of trained good to be about the same as the number of trained junk, as there can also be issues that arise from too much training of good (particularly if the messages differ in some systematic way from the junk, such as being at different times.)

8:58 am
March 26, 2010


rkent

Admin

posts 140

There are lots of oddities in the tokens if you really take the time to examine them. I can't recall a case though where I examined the "why" of a particular token and decided that there was a true bug in the software. Most are training issues.

What that token is telling you is that, of the messages that you have trained, a much higher proportion of those that were sent to: were junk than good. Even if "a large proportion of the emails that I get (and want)" are like that, an even LARGER portion of those that you get (and don't want) are like that.

Now that could be real, or it could be an issue of inadequate training. For training, you should have about the same number of junk as good messages trained. If it is real though, then the junk analyzer is biased toward marking as "junk", and it should be. Tinkering with that one token is not likely to improve the overall performance of the filter. Look at it this way: a message with no real content, but addressed to you, has a 94% chance of being junk. Think something like image spam here.

On my own system, I am now experimentally using a filter (from FiltaQuilla) to automatically train messages as good that match my address book. I need though to add the capability to limit the number of trained good to be about the same as the number of trained junk, as there can also be issues that arise from too much training of good (particularly if the messages differ in some systematic way from the junk, such as being at different times.)

12:50 am
March 27, 2010


Pollik

Guest

That might explain…thanks for your comments.

It still seems to me that the addressee field is a poor indicator of junk, but it is how it is and I am always grateful to the guys who are putting these things out in the public domain.

Keep up the good work. :)

Reply to Post

Reply to Topic:
Junk Analysis Detail

Guest Name (Required):

Guest Email (Required):

NOTE: New Posts are subject to administrator approval before being displayed

Smileys
Confused Cool Cry Embarassed Frown Kiss Laugh Smile Surprised Wink Yell
Post New Reply

Guest URL (required)

Math Required!
What is the sum of:
12 + 9