JunQuilla version 1.0.0 released

By | November 16, 2009

Today I released a version of JunQuilla that supports SeaMonkey 2.0, and the latest versions of Thunderbird including the upcoming 3.0RC1 and 3.0.0 The new version can be downloaded from the AMO site here. I’ve also submitted this version for review so that it can get out of experimental status.

JunQuilla is my attempt to extend the user interface in the Mozilla mailnews product to provide the information that I believe is needed to properly manage the bayesian junk filter. I suppose that most of these features should really be in the core product, but I found that support for that was not very strong, so I decided to do most of this in an extension instead. These backend features have only been added to the core code in the last couple of years, so this extension will only work on newer versions of the Mozilla email clients (Thunderbird 3.* versions, and SeaMonkey 2.* versions.)

Version 1.0.0 fixes some bugs that have been reported in previous releases, provides partial support for SeaMonkey (except for the “Uncertain” folders), and adds a number of new features:

Junk Options

You can set critical overall junk options in the standard junk options screen (previously, this was only possible in the more obscure addons/options area). In Thunderbird, select Tools/Options/Security/Junk. In SeaMonkey, select Edit/Preferences/Mail & Newsgoups/JunQuilla. There you will see a display like this:

JunQuilla Options

“Junk threshold” is the percentage value as calculated by the bayes classifier for each message, above which a message will be classified as junk. This should be set as low as possible, though always high enough to avoid having any real messages classified as junk. The default value of 90 is much too high for a well-trained bayes classifier.

The “Maximum token count” is a measure of the resources that the junk classifier will use. The higher it is set, the more accurate your classifications will be. The default value of 100,000 is probably too low for good classification performance. I’ve had good results with 300,000 – and JunQuilla will set your value to this when first installed. If this value is too high, and you have trained a lot of messages, then memory usage may be excessive.

The other parameters are read only, and are displays of values from your training file. The “Current token count” shows how many junk training tokens (which are like words) are currently in use. You probably won’t get good performance until this number is over 10,000 – and it really should be more like 100,000. “Good” and “Junk” messages trained shows how many messages have been used to train the junk filter. Ideally the number of junk and good messages should be more or less equal. If they are not, then pick some previously untrained messages and train them.

When the number of tokens exceeds the maximum value, then Mozilla mailnews will prune the training file in a large chunk, typically reducing both the number of trained messages, and the number of tokens, in about half.

Disable/enable junk processing for a folder.

You can set an “inherited folder property” to allow you to selectively enable or disable junk processing for folders. This has two main uses.

  • If you have server-side filters that process email in IMAP, then you may already know that certain folders contain either junk mail or good mail, and don’t want to waste time processing them locally – or take the risk that they will be processed incorrectly.
  • Mozilla mailnews core code now supports junk processing of RSS and News folders. You can select certain RSS or News folders, and then junk processing will run on new posts to those folders. This will also enable the standard user interface features that allow you to train messages as good or junk in those folders.

To set this, right click on a folder in the folder tree, and select Properties, then the “General Information” tab. At the bottom, you will see this:

JunQuillaFolderProperties

“Analyze Junk” is an inherited folder property. What that means is that each folder can either gets its value from its parent, or can be set locally. The default value depends on the characteristics of the folder itself. So for example, this would be disabled in News by default, but enabled in IMAP. To change the value, first reset the “Inherit” checkbox, then set the value that you want in “Enabled”. If you change a value for a folder, then the value will also change for the children of that folder (assumming that they have the default “Inherit” checked.)

Toolbar “Is Junk” and “Is Good” button

You can add two new buttons to your toolbar – “Is Junk” and “Is Good”. Here’s what they look like, next to a standard “Junk/Not Junk” icon:

JunQuilla Is Good or Is Junk toolbar button

To add these buttons, right click on a toolbar, select “Customize”, then drag the buttons to the desired location.

The standard Junk button will always show as “Junk” when it thinks a message is good, and “Good” when it thinks that a message is junk. But that means that we can only classify a message as “Good” when it has been falsely classified as junk, and we never want our junk filter to do that. The “Is Good” button is meant to be used in the “Uncertain” folders to give you a means to train a message as “Good” there.