Forum

Please consider registering
guest

sp_LogInOut Log In sp_Registration Register

Register | Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_Feed Topic RSS sp_TopicIcon
Junk Analysis Detail
March 26, 2010
7:11 am
Pollik
Guest
Guests

I am really liking Junquilla - till now, I have always struggled with Thunderbird's anti spam.  

I am puzzled by one issue that can contribute towards false positives.  Looking at the Junk Analysis Detail, the receiving emailing is always allocated a high token value:

to:<myaddress@mydomain.net>          Token 94%

Such a high allocation seems rather odd to me - a large proportion of the emails that I get are addressed to me.Confused  


Polly

March 26, 2010
8:58 am
Admin
Moderators
Forum Posts: 423
Member Since:
July 12, 2008
sp_UserOfflineSmall Offline

There are lots of oddities in the tokens if you really take the time to examine them. I can't recall a case though where I examined the "why" of a particular token and decided that there was a true bug in the software. Most are training issues.

What that token is telling you is that, of the messages that you have trained, a much higher proportion of those that were sent to: were junk than good. Even if "a large proportion of the emails that I get (and want)" are like that, an even LARGER portion of those that you get (and don't want) are like that.

Now that could be real, or it could be an issue of inadequate training. For training, you should have about the same number of junk as good messages trained. If it is real though, then the junk analyzer is biased toward marking as "junk", and it should be. Tinkering with that one token is not likely to improve the overall performance of the filter. Look at it this way: a message with no real content, but addressed to you, has a 94% chance of being junk. Think something like image spam here.

On my own system, I am now experimentally using a filter (from FiltaQuilla) to automatically train messages as good that match my address book. I need though to add the capability to limit the number of trained good to be about the same as the number of trained junk, as there can also be issues that arise from too much training of good (particularly if the messages differ in some systematic way from the junk, such as being at different times.)

March 26, 2010
8:58 am
Admin
Moderators
Forum Posts: 423
Member Since:
July 12, 2008
sp_UserOfflineSmall Offline

There are lots of oddities in the tokens if you really take the time to examine them. I can't recall a case though where I examined the "why" of a particular token and decided that there was a true bug in the software. Most are training issues.

What that token is telling you is that, of the messages that you have trained, a much higher proportion of those that were sent to: were junk than good. Even if "a large proportion of the emails that I get (and want)" are like that, an even LARGER portion of those that you get (and don't want) are like that.

Now that could be real, or it could be an issue of inadequate training. For training, you should have about the same number of junk as good messages trained. If it is real though, then the junk analyzer is biased toward marking as "junk", and it should be. Tinkering with that one token is not likely to improve the overall performance of the filter. Look at it this way: a message with no real content, but addressed to you, has a 94% chance of being junk. Think something like image spam here.

On my own system, I am now experimentally using a filter (from FiltaQuilla) to automatically train messages as good that match my address book. I need though to add the capability to limit the number of trained good to be about the same as the number of trained junk, as there can also be issues that arise from too much training of good (particularly if the messages differ in some systematic way from the junk, such as being at different times.)

March 27, 2010
12:50 am
Pollik
Guest
Guests

That might explain...thanks for your comments.

It still seems to me that the addressee field is a poor indicator of junk, but it is how it is and I am always grateful to the guys who are putting these things out in the public domain.

Keep up the good work. 🙂

Forum Timezone: UTC -8

Most Users Ever Online: 41

Currently Online: ritaHinc
2 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

BigMike: 14

David.P: 10

Jeff Wexler: 9

taa: 8

JPRuehmann: 8

bobkatz: 8

Member Stats:

Guest Posters: 217

Members: 2350

Moderators: 2

Admins: 1

Forum Stats:

Groups: 1

Forums: 7

Topics: 375

Posts: 1220

Newest Members:

AlbertKet, Kevintuh, LazaroVag, elinorgb1, Niki1Kevick, AnthonyPaino

Administrators: rkent: 423