July 12, 2008
Today I released a version of JunQuilla that supports SeaMonkey 2.0, and the latest versions of Thunderbird including the upcoming 3.0RC1 and 3.0.0 The new version can be downloaded from the AMO site here. I've also submitted this version for review so that it can get out of experimental status.
JunQuilla is my attempt to extend the user interface in the Mozilla mailnews product to provide the information that I believe is needed to properly manage the bayesian junk filter. I suppose that most of these features should really be in the core product, but I found that support for that was not very strong, so I decided to do most of this in an extension instead. These backend features have only been added to the core code in the last couple of years, so this extension will only work on newer versions of the Mozilla email clients (Thunderbird 3.* versions, and SeaMonkey 2.* versions.)
Version 1.0.0 fixes some bugs that have been reported in previous releases, provides partial support for SeaMonkey (except for the "Uncertain" folders), and adds a number of new features:
You can set critical overall junk options in the standard junk options screen (previously, this was only possible in the more obscure addons/options area). In Thunderbird, select Tools/Options/Security/Junk. In SeaMonkey, select Edit/Preferences/Mail & Newsgoups/JunQuilla. There you will see a display like this:
"Junk threshold" is the percentage value as calculated by the bayes classifier for each message, above which a message will be classified as junk. This should be set as low as possible, though always high enough to avoid having any real messages classified as junk. The default value of 90 is much too high for a well-trained bayes classifier.
The "Maximum token count" is a measure of the resources that the junk classifier will use. The higher it is set, the more accurate your classifications will be. The default value of 100,000 is probably too low for good classification performance. I've had good results with 300,000 - and JunQuilla will set your value to this when first installed. If this value is too high, and you have trained a lot of messages, then memory usage may be excessive.
The other parameters are read only, and are displays of values from your training file. The "Current token count" shows how many junk training tokens (which are like words) are currently in use. You probably won't get good performance until this number is over 10,000 - and it really should be more like 100,000. "Good" and "Junk" messages trained shows how many messages have been used to train the junk filter. Ideally the number of junk and good messages should be more or less equal. If they are not, then pick some previously untrained messages and train them.
When the number of tokens exceeds the maximum value, then Mozilla mailnews will prune the training file in a large chunk, typically reducing both the number of trained messages, and the number of tokens, in about half.
Disable/enable junk processing for a folder.
You can set an "inherited folder property" to allow you to selectively enable or disable junk processing for folders. This has two main uses.
- If you have server-side filters that process email in IMAP, then you may already know that certain folders contain either junk mail or good mail, and don't want to waste time processing them locally - or take the risk that they will be processed incorrectly.
- Mozilla mailnews core code now supports junk processing of RSS and News folders. You can select certain RSS or News folders, and then junk processing will run on new posts to those folders. This will also enable the standard user interface features that allow you to train messages as good or junk in those folders.
To set this, right click on a folder in the folder tree, and select Properties, then the "General Information" tab. At the bottom, you will see this:
"Analyze Junk" is an inherited folder property. What that means is that each folder can either gets its value from its parent, or can be set locally. The default value depends on the characteristics of the folder itself. So for example, this would be disabled in News by default, but enabled in IMAP. To change the value, first reset the "Inherit" checkbox, then set the value that you want in "Enabled". If you change a value for a folder, then the value will also change for the children of that folder (assumming that they have the default "Inherit" checked.)
Toolbar "Is Junk" and "Is Good" button
You can add two new buttons to your toolbar - "Is Junk" and "Is Good". Here's what they look like, next to a standard "Junk/Not Junk" icon:
To add these buttons, right click on a toolbar, select "Customize", then drag the buttons to the desired location.
The standard Junk button will always show as "Junk" when it thinks a message is good, and "Good" when it thinks that a message is junk. But that means that we can only classify a message as "Good" when it has been falsely classified as junk, and we never want our junk filter to do that. The "Is Good" button is meant to be used in the "Uncertain" folders to give you a means to train a message as "Good" there.
July 12, 2008
Neil Rashbrook said:At least in SeaMonkey, you can also mark an uncertain message as not junk using Shift+J or the Message - Mark - As Not Junk menuitem.
The same is true in Thunderbird. Mark as good is possible, just not as easy as marking as junk. Also, I have a special use case in that every morning, while eating breakfast I read email on a tablet computer with no keyboard, so I have to rely on the mouse equivalent (a touch pen) for actions. For the mouse, reducing the number of positioning and clicking actions is key to productivity. A big toolbar button is much easier to activate than selecting a menu, navigating to an option, then clicking it.
Reducing the barriers to training of good emails is a real key to better junk filtering performance. I'm also experimenting with automatic training. Right now, I am using the "Train as Good" custom filter action from FiltaQuilla to automatically train messages as good where the sender is in my address book, but not in my domain (which is sometimes spoofed).
Most Users Ever Online: 41
Currently Online: ritaHinc
Currently Browsing this Page:
Guest Posters: 217
Newest Members:AlbertKet, Kevintuh, LazaroVag, elinorgb1, Niki1Kevick, AnthonyPaino
Administrators: rkent: 423