Forum

 
Current User: Guest Login Register
Please consider registering


Register? | Lost Your Password?

Usenet News and Token Count

Add a New Topic Reply to Post
UserPost

9:11 pm
November 25, 2009


front243

Denmark

New Member

posts 2

Hi, I use JunQuilla for email as well as about a dozen usenet news-groups. I wonder if it would be a good idea to increase the "Maximum Token Count" from 300.000 to a higher number when I use it for news as well?

10:11 pm
November 25, 2009


rkent

Admin

posts 140

There is very little experience with newsgroup management of junk mail in Thunderbird, so I would be happy to hear your experiences. In general though, the more tokens the better, so if you are not having performance problems (that is speed issues), then you could increase that. But it takes roughly 10x more tokens to double accuracy, so there is clearly a point of diminishing returns here.

Currently, there is a core bug that prevents the tokenization of newsgroups messages from separating headers from the body, so it would be less accurate than email (plus it looks very different). But the newsgroup spammers are not as clever as those in email (they haven't had to be!) so overall my guess is it works OK. But I have little actual experience with it.

But to directly answer your question, yes it is probably a good idea, but don't expect dramatic changes.

5:43 pm
November 26, 2009


front243

Denmark

New Member

posts 2

Thanks. I tried to raise it to 1 mio. and I see no slowdown on my system.

The problem with usenet is not only spam but also usenet-"trolls". I hope this filter can automatically sort it out eventually, but I am still unsure if it will be 100% effective against that as well.

I am subscribed to a group with lots of messages, much useful information but also lots of "trolls" and noise. I tried marking everything good/bad (about 500 messages). Already today I think its pretty good at spotting the junk, but offcourse there is a few false positives/negatives.

9:41 pm
November 26, 2009


rkent

Admin

posts 140

"Thanks. I tried to raise it to 1 mio. and I see no slowdown on my system."

You would not see it right away. As you train, the number of tokens in use goes up. Eventually you will hit the limit, then the algorithm to prune the database kicks in, which will reduce the number of tokens by about half all at once. So it might take months or years for you to increase the number of tokens up to the point that such a large limit starts to affect your system. (You can see the number of tokens currently in use with JunQuilla, by the way, under tools/options/security.)

"The problem with usenet is not only spam but also usenet-"trolls"."

What you are describing is closely related to junk, but is a little different. You would get a better result if you used a bayes filter trained specifically for that purpose. In TB3, the internal bayes filter now supports multiple such characteristics (which I call "traits"), but there needs to be user interface provided to it. My extension TaQuilla was a first attempt to do that, but that extension is now effectively obsolete. I hope to soon get back to it to see if perhaps we could support use cases such as you are describing.

Reply to Post

Reply to Topic:
Usenet News and Token Count

Guest Name (Required):

Guest Email (Required):

NOTE: New Posts are subject to administrator approval before being displayed

Smileys
Confused Cool Cry Embarassed Frown Kiss Laugh Smile Surprised Wink Yell
Post New Reply

Guest URL (required)

Math Required!
What is the sum of:
11 + 10