Topic RSS
Hi all.
For some strange reason even though I always mark messages from a specific sender as junk I still get some messages from him which are automatically considered not junk by thunderbird (and junquilla) either because they have very low positive junk% (e.g 30) or because their junk% is absolutely zero!. When I perform a "Junk analysis Detail" I get a final running % of100 for each of those messages. How can this be?
(I run thunderbird 3.1.15)
Thanks
Giorgos
"How can this be":
usually because you have trained on the message. The Junk% shows the % used at the time of message classification. If you later train, then the "Junk Analysis Detail" will show the result with the new training.
"though I always mark messages from a specific sender as junk I still get some messages from him which are automatically considered not junk":
In the TB bayes tokenizer, at the point where the sender would be added to the token list there is this comment:
// important: leave out sender field. Too strong of an indicator
So somebody in the distant past decided that sender was too strong, and therefore it is actually not one of the considered tokens.
At one point, I added to the base code the ability to work around these limitations. I really don't have the time at the moment to work on this with you though. But my code comment, that is still valid if you want to experiment yourself, is this:
/*
* Extensions may wish to enable or disable tokenization of certain headers.
* Define any headers to enable/disable in a string preference like this:
* "mailnews.bayesian_spam_filter.tokenizeheader.headername"
*
* where "headername" is the header to tokenize. For example, to tokenize the
* header "x-spam-status" use the preference:
*
* "mailnews.bayesian_spam_filter.tokenizeheader.x-spam-status"
*
* The value of the string preference will be interpreted in one of
* four ways, depending on the value:
*
* If "false" then do not tokenize that header
* If "full" then add the entire header value as a token,
* without breaking up into subtokens using delimiters
* If "standard" then tokenize the header using as delimiters the current
* value of the generic header delimiters
* Any other string is interpreted as a list of delimiters to use to parse
* the header. \t, \n, \v, \f, \r, and \\ will be escaped to their normal
* C-library values, all other two-letter combinations beginning with \
* will be ignored.
*
* Header names in the preference should be all lower case
*
* Extensions may also set the maximum length of a token (default is
* kMaxLengthForToken) by setting the int preference:
* "mailnews.bayesian_spam_filter.maxlengthfortoken"
*/
Kent James
Most Users Ever Online: 18
Currently Online:
9 Guest(s)
Currently Browsing this Page:
1 Guest(s)
Member Stats:
Guest Posters: 130
Members: 565
Moderators: 1
Admins: 1
Forum Stats:
Groups: 1
Forums: 7
Topics: 231
Posts: 802
Newest Members: Matteo, p.dobrogost, gaute, Mythobeast, terry, Livraria Notre Dame
Moderators: rkent (323)
Administrators: rkent (323)
Log In
Register
Members
Home
Add Reply
Add Topic
Offline
Quote

Recent Comments