Forum

Please consider registering
guest

sp_LogInOut Log In sp_Registration Register

Register | Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_Feed Topic RSS sp_TopicIcon
Usenet News and Token Count
November 25, 2009
9:11 pm
front243
Denmark
New Member
Members
Forum Posts: 2
Member Since:
November 25, 2009
sp_UserOfflineSmall Offline

Hi, I use JunQuilla for email as well as about a dozen usenet news-groups. I wonder if it would be a good idea to increase the "Maximum Token Count" from 300.000 to a higher number when I use it for news as well?

November 25, 2009
10:11 pm
Admin
Moderators
Forum Posts: 423
Member Since:
July 12, 2008
sp_UserOfflineSmall Offline

There is very little experience with newsgroup management of junk mail in Thunderbird, so I would be happy to hear your experiences. In general though, the more tokens the better, so if you are not having performance problems (that is speed issues), then you could increase that. But it takes roughly 10x more tokens to double accuracy, so there is clearly a point of diminishing returns here.

Currently, there is a core bug that prevents the tokenization of newsgroups messages from separating headers from the body, so it would be less accurate than email (plus it looks very different). But the newsgroup spammers are not as clever as those in email (they haven't had to be!) so overall my guess is it works OK. But I have little actual experience with it.

But to directly answer your question, yes it is probably a good idea, but don't expect dramatic changes.

November 26, 2009
5:43 pm
front243
Denmark
New Member
Members
Forum Posts: 2
Member Since:
November 25, 2009
sp_UserOfflineSmall Offline

Thanks. I tried to raise it to 1 mio. and I see no slowdown on my system.

The problem with usenet is not only spam but also usenet-"trolls". I hope this filter can automatically sort it out eventually, but I am still unsure if it will be 100% effective against that as well.

I am subscribed to a group with lots of messages, much useful information but also lots of "trolls" and noise. I tried marking everything good/bad (about 500 messages). Already today I think its pretty good at spotting the junk, but offcourse there is a few false positives/negatives.

November 26, 2009
9:41 pm
Admin
Moderators
Forum Posts: 423
Member Since:
July 12, 2008
sp_UserOfflineSmall Offline

"Thanks. I tried to raise it to 1 mio. and I see no slowdown on my system."

You would not see it right away. As you train, the number of tokens in use goes up. Eventually you will hit the limit, then the algorithm to prune the database kicks in, which will reduce the number of tokens by about half all at once. So it might take months or years for you to increase the number of tokens up to the point that such a large limit starts to affect your system. (You can see the number of tokens currently in use with JunQuilla, by the way, under tools/options/security.)

"The problem with usenet is not only spam but also usenet-"trolls"."

What you are describing is closely related to junk, but is a little different. You would get a better result if you used a bayes filter trained specifically for that purpose. In TB3, the internal bayes filter now supports multiple such characteristics (which I call "traits"), but there needs to be user interface provided to it. My extension TaQuilla was a first attempt to do that, but that extension is now effectively obsolete. I hope to soon get back to it to see if perhaps we could support use cases such as you are describing.

September 1, 2012
12:07 am
Luis
Guest
Guests
Awaiting Moderation

Forum Timezone: UTC -8

Most Users Ever Online: 41

Currently Online: ritaHinc
2 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

BigMike: 14

David.P: 10

Jeff Wexler: 9

taa: 8

JPRuehmann: 8

bobkatz: 8

Member Stats:

Guest Posters: 217

Members: 2350

Moderators: 2

Admins: 1

Forum Stats:

Groups: 1

Forums: 7

Topics: 375

Posts: 1220

Newest Members:

AlbertKet, Kevintuh, LazaroVag, elinorgb1, Niki1Kevick, AnthonyPaino

Administrators: rkent: 423