Forum

Share

Please consider registering
guest

Log In RegisterMembers
Or log in with

Register | Lost password?
Advanced Search:

— Forum Scope —



— Match —



— Forum Options —




Wildcard usage:
*  matches any number of characters    %  matches exactly one character

Minimum search word length is 4 characters - maximum search word length is 84 characters

Topic RSS
zero Junk% along with a 100% running% from Junk Analysis Details
February 10, 2012
5:53 am
Member
Forum Posts: 4
Member Since:
February 10, 2012
Offline

Hi all.

For some strange reason even though I always mark messages from a specific sender as junk I still get some messages from him which are automatically considered not junk by thunderbird (and junquilla) either because they have very low positive junk% (e.g 30) or because their junk% is absolutely zero!. When I perform a "Junk analysis Detail" I get a final running % of100 for each of those messages. How can this be?

(I run thunderbird 3.1.15)

 

Thanks

Giorgos

Share
February 14, 2012
12:35 pm
Admin
Forum Posts: 323
Member Since:
July 12, 2008
Offline

"How can this be":

usually because you have trained on the message. The Junk% shows the % used at the time of message classification. If you later train, then the "Junk Analysis Detail" will show the result with the new training.

"though I always mark messages from a specific sender as junk I still get some messages from him which are automatically considered not junk":

In the TB bayes tokenizer, at the point where the sender would be added to the token list there is this comment:

// important: leave out sender field. Too strong of an indicator

So somebody in the distant past decided that sender was too strong, and therefore it is actually not one of the considered tokens.

At one point, I added to the base code the ability to work around these limitations. I really don't have the time at the moment to work on this with you though. But my code comment, that is still valid if you want to experiment yourself, is this:

  /*
   * Extensions may wish to enable or disable tokenization of certain headers.
   * Define any headers to enable/disable in a string preference like this:
   *   "mailnews.bayesian_spam_filter.tokenizeheader.headername"
   *
   * where "headername" is the header to tokenize. For example, to tokenize the
   * header "x-spam-status" use the preference:
   *
   *   "mailnews.bayesian_spam_filter.tokenizeheader.x-spam-status"
   *
   * The value of the string preference will be interpreted in one of
   * four ways, depending on the value:
   *
   *   If "false" then do not tokenize that header
   *   If "full" then add the entire header value as a token,
   *     without breaking up into subtokens using delimiters
   *   If "standard" then tokenize the header using as delimiters the current
   *     value of the generic header delimiters
   *   Any other string is interpreted as a list of delimiters to use to parse
   *     the header. \t, \n, \v, \f, \r, and \\ will be escaped to their normal
   *     C-library values, all other two-letter combinations beginning with \
   *     will be ignored.
   *
   * Header names in the preference should be all lower case
   *
   * Extensions may also set the maximum length of a token (default is
   * kMaxLengthForToken) by setting the int preference:
   *   "mailnews.bayesian_spam_filter.maxlengthfortoken"
   */

Kent James

Share
February 15, 2012
6:22 am
Member
Forum Posts: 4
Member Since:
February 10, 2012
Offline

Thank you very much Kent for the detailed response.

Giorgos

Share
Forum Timezone: UTC -8

Most Users Ever Online: 18

Currently Online:
9 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

bobkatz: 8

BigMike: 8

t2m: 7

zabolyx: 7

taa: 6

onlyme: 6

Member Stats:

Guest Posters: 130

Members: 565

Moderators: 1

Admins: 1

Forum Stats:

Groups: 1

Forums: 7

Topics: 231

Posts: 802

Moderators: rkent (323)

Administrators: rkent (323)