ToneQuilla version 1.0.1

November 28, 2009 – 9:00 pm

ToneQuilla version 1.0.1 has been posted on AMO for review (or is available on this site here.) This fixes a bug reported in the forum, where for some users .wav files were playing in the default media player, instead of using Mozilla’s internal code.

Maybe I need a search extension – SearchaQuilla?

November 20, 2009 – 11:13 am

The last few weeks I’ve been adding custom search terms to my FiltaQuilla extension using the new nsIMsgSearchCustomTerm interface, which can then be used in searches, virtual folders, or filters. But I keep coming up with new things that I want to do. That delays my packaging of FiltaQuilla 1.0.0 for non-experimental release. Maybe I should quit adding this stuff to FiltaQuilla (which is already pretty large with all of its filter actions) and define a new search-oriented extension, called probably SearchaQuilla?

So far, I have added the following new search terms:

BCC – locate items in the BCC field

Subject Regex – search the subject using a javascript regular expression

Header Regex – search any specific header using a javascript regular expression

Javascript – load javascript in a text field, and program your own search given an nsIMsgDBHdr object

Tag of Thread Head – match a tag in the head of a message’s thread

Tag of Thread Messages – match a tag near the message in its thread (within +/- 10 messages by default)

Address in Thread – match an address near the message in its thread (within +/- 10 messages by default)

This stuff can be useful outside of filters, in fact I am mostly using them personally to define virtual folders. So I’ll probably move them to a new extension, and try to get FiltaQuilla out the door finally.

rkent

JunQuilla version 1.0.0 released

November 16, 2009 – 1:08 pm

Today I released a version of JunQuilla that supports SeaMonkey 2.0, and the latest versions of Thunderbird including the upcoming 3.0RC1 and 3.0.0 The new version can be downloaded from the AMO site here. I’ve also submitted this version for review so that it can get out of experimental status.

JunQuilla is my attempt to extend the user interface in the Mozilla mailnews product to provide the information that I believe is needed to properly manage the bayesian junk filter. I suppose that most of these features should really be in the core product, but I found that support for that was not very strong, so I decided to do most of this in an extension instead. These backend features have only been added to the core code in the last couple of years, so this extension will only work on newer versions of the Mozilla email clients (Thunderbird 3.* versions, and SeaMonkey 2.* versions.)

Version 1.0.0 fixes some bugs that have been reported in previous releases, provides partial support for SeaMonkey (except for the “Uncertain” folders), and adds a number of new features:

Junk Options

You can set critical overall junk options in the standard junk options screen (previously, this was only possible in the more obscure addons/options area). In Thunderbird, select Tools/Options/Security/Junk. In SeaMonkey, select Edit/Preferences/Mail & Newsgoups/JunQuilla. There you will see a display like this:

JunQuilla Options

“Junk threshold” is the percentage value as calculated by the bayes classifier for each message, above which a message will be classified as junk. This should be set as low as possible, though always high enough to avoid having any real messages classified as junk. The default value of 90 is much too high for a well-trained bayes classifier.

The “Maximum token count” is a measure of the resources that the junk classifier will use. The higher it is set, the more accurate your classifications will be. The default value of 100,000 is probably too low for good classification performance. I’ve had good results with 300,000 – and JunQuilla will set your value to this when first installed. If this value is too high, and you have trained a lot of messages, then memory usage may be excessive.

The other parameters are read only, and are displays of values from your training file. The “Current token count” shows how many junk training tokens (which are like words) are currently in use. You probably won’t get good performance until this number is over 10,000 – and it really should be more like 100,000. “Good” and “Junk” messages trained shows how many messages have been used to train the junk filter. Ideally the number of junk and good messages should be more or less equal. If they are not, then pick some previously untrained messages and train them.

When the number of tokens exceeds the maximum value, then Mozilla mailnews will prune the training file in a large chunk, typically reducing both the number of trained messages, and the number of tokens, in about half.

Disable/enable junk processing for a folder.

You can set an “inherited folder property” to allow you to selectively enable or disable junk processing for folders. This has two main uses.

  • If you have server-side filters that process email in IMAP, then you may already know that certain folders contain either junk mail or good mail, and don’t want to waste time processing them locally – or take the risk that they will be processed incorrectly.
  • Mozilla mailnews core code now supports junk processing of RSS and News folders. You can select certain RSS or News folders, and then junk processing will run on new posts to those folders. This will also enable the standard user interface features that allow you to train messages as good or junk in those folders.

To set this, right click on a folder in the folder tree, and select Properties, then the “General Information” tab. At the bottom, you will see this:

JunQuillaFolderProperties

“Analyze Junk” is an inherited folder property. What that means is that each folder can either gets its value from its parent, or can be set locally. The default value depends on the characteristics of the folder itself. So for example, this would be disabled in News by default, but enabled in IMAP. To change the value, first reset the “Inherit” checkbox, then set the value that you want in “Enabled”. If you change a value for a folder, then the value will also change for the children of that folder (assumming that they have the default “Inherit” checked.)

Toolbar “Is Junk” and “Is Good” button

You can add two new buttons to your toolbar – “Is Junk” and “Is Good”. Here’s what they look like, next to a standard “Junk/Not Junk” icon:

JunQuilla Is Good or Is Junk toolbar button

To add these buttons, right click on a toolbar, select “Customize”, then drag the buttons to the desired location.

The standard Junk button will always show as “Junk” when it thinks a message is good, and “Good” when it thinks that a message is junk. But that means that we can only classify a message as “Good” when it has been falsely classified as junk, and we never want our junk filter to do that. The “Is Good” button is meant to be used in the “Uncertain” folders to give you a means to train a message as “Good” there.

Join the forum discussion on this post - (3) Posts

GlodaQuilla version 0.3.0 released

November 12, 2009 – 3:46 pm

This release is intended to update GlodaQuilla to work with Thunderbird version 3.0 (including release candidate RC1). You can download the new version of GlodaQuilla here. (GlodaQuilla’s main function is to provide access to information about the indexing status of each Thunderbird message in the “gloda” global database.)

In addition to the gloda columns, this release adds an experimental feature to allow overriding of the default gloda configuration concerning which messages are indexed. This new inherited folder property is called “Index in Global Database”.  Access to inherited properties such as this are discussed in the post “Inherited Folder Properities (revisited)”

You can use this property to disable indexing of servers or folders that are normally allowed, or to enable indexing on folders that are normally disabled (such as trash or junk folders). In my system, for example, for testing purposes I download my email both as IMAP and as POP3, but I disable the indexing of IMAP to prevent double indexing of the same messages.

This new property does not make any attempt to fix existing indexing when it is changed. Minimal support for this hopefully will be added for the 1.0.0 release of GlodaQuilla.

Join the forum discussion on this post - (1) Posts

ToneQuilla 1.0.0 released

November 10, 2009 – 12:22 pm

I’ve now released ToneQuilla version 1.0.0 on AMO. This allows users in Thunderbird 3.0 and SeaMonkey 2.0 to play a particular sound as a filter action, so that different types of emails can play different sounds.

In this release, I’ve fixed some bugs, plus added support for some new sound formats. If your operating system will launch a .mp3 file in a local player, you can now ask it to do that as part of the filter action. I also support .ogg files using Mozilla’s standard Ogg Vorbis player, though my experience has been that this not reliable enough yet for real usage, at least on Windows XP.

You can download this from AMO, see a detailed description of ToneQuilla here, or see details of changes in this revision from the releases page.

I’ve also now submitted this for official review by the folks at AMO, so hopefully after a month this will be able to leave the experimental sandbox, and things like automatic updates should also work.

Make sure that you make a filter to play “We Are in Love” for the emails that you receive from your significant other!

rkent

Inherited Folder Properties – revisited

November 6, 2009 – 11:50 am

In a previous posting, I introduced the concept of inherited folder properties in the Mozilla mailnews products (Thunderbird and SeaMonkey). In the months since, I have incorporated these into my extensions quite significantly, so here I would like to show the UI I am currently using for this, and also discuss some of the issues that I face.

(All references to extensions in this posting refer to the 1.0.0 versions, which as of this writing have not been posted to AMO yet. But they should be available in a few weeks.)

Implemented UI

Briefly, inherited properties are a property that can be defined globally, at the server, or at the folder, and its characteristics will be propagated to child objects. This make it easy to specify precisely how the property is applied.

As an example, I have recently implemented a feature “Index in Global Database” in GlodaQuilla which can be used to selectively suppress certain accounts or folders from being accessed by the global database indexer. In the account manager, where indexing can be disabled for an entire account, the UI looks like this:

Index in Global Database account settings

Each inherited property has default values which are typically set by the base code. In the case of the gloda database indexer, everything but newgroups are indexed by default. Initially each inherited property is set to just use the standard default processing, but if I clear the “default” checkbox, then I can turn off gloda indexing for this account.

If I do that, then go to a first-level folder in the account, I see the following under folder properties:

Index in Global Database by folder

At the folder level, because I disabled global indexing on the account, it is now shown as disabled on the folder. I could clear the inherit box and selectively enable it on just this folder and its children if I wanted.

This particular UI merges naturally with the existing methods of setting properties in mailnews, but I’m not sure it is optimum for an inherited property. The inherited nature could be more clearly shown, and a particular feature more quickly configured, if I showed a tree of accounts and folders, with checkboxes next to each account to enable or inherit the feature. Maybe in a future version.

Implemented properties

Here are some of the implementations of inherited properties that exist in my extensions:

  1. (GlodaQuilla) Index in Global Database – suppresses the running of the global database indexer
  2. (FiltaQuilla) Apply Filters to Folder – for Imap folders, allow incoming filters to run on that folder
  3. (JunQuilla) Analyze Junk – allow junk processing to be turned on or off. This also allows junk processing to run on RSS or news folders.
  4. (TaQuilla) Analyze particular automatic tags.

Issue: Existing mechanisms

Ideally, the inherited property would be the one and only way to manage a program feature. But for existing features, the existing mechanisms remain, which can lead to possible confusion. For example, with JunQuilla’s “Analyze Junk” property, there is existing UI to enable junk processing at the account level. Here the inherited property will always override the default mechanism (but that is mostly because I implemented it in core that way, and I have a little influence on how junk processing is handled in core.) For GlodaQuilla’s “Index in Global Database” the behaviour is different. Existing UI will only allow this to be enabled or disabled globally, and the inherited property does not override this. The inherited property code uses the default server preference as a global enable/disable for a property, so if gloda used that same mechanism instead of an independent preference, this issue would go away. I guess I could say the same thing about junk processing as well.

For FiltaQuilla’s “Apply Filters to Folder”, there is a subtle issue in the inherited nature. I did not implement in core the ability of the inherited property to override the existing default as applied to the Inbox, so incoming filters always run on the inbox. That creates 2 ui issues. First, although I show at the account level the “Apply Filters” option, it does not actually suppress application to the inbox as one would expect. Second, I currently do not show a folder property for “Apply Filters to Folder” for the IMAP inbox since it would not make sense there, so that also means there is no way to enable processing of filters for the children of the inbox. Maybe I should call this feature instead “Apply Filters to non-Inbox Folders” to solve this, or change the core code so that the feature also applied to the Inbox.

Issue: My RDF-inspired property for junk management

Looking ahead to a world where a number of extensions might try to define bayes filter traits, in code I recommended that properties used to manage junk processing use an RDF-inspired globally unique identifier. Then I followed my own advice and defined the identifier that controls junk processing on a folder as: “dobayes.mailnews@mozilla.org#junk” Unfortunately, the existing account manager code does not allow periods in property names, which means I could not use the account manager to manage this. I’ve filed bug 525024 on this issue, and perhaps that can be incorporated after TB 3.0 / SM 2.0.

Issue: Missing inheritance levels

I’ve heard others comment that often they want to set a property on a particular class of folders, say on all Trash folders, or all Sent folders. I’ve considered implementing another level in the inherited properties feature, that would be a folder type. So you would then set a property that would be inherited by any folder of a particular type, and its children – and of course also overridden by the local folder property.

Issue: UI for global property

All of the inherited properties could also be enabled globally using the “mail.server.default.<property>” preference, but I did not give any UI for that in my extensions. I thought that would be too confusing for the user to show those preferences, which would be very similar to existing mechanisms. This is not an issue for properties that use the preference system for server-level issues, but none of the existing server-level preferences are also inherited properties. Perhaps we could move that direction in the future.

Managing spam with “after classification” filters

August 28, 2009 – 1:12 pm

Nightly builds after 2009-08-19 of Thunderbird (or upcoming 3.0 beta 4) and SeaMonkey  (or upcoming 2.0 beta 2) include a new ability to apply message filters after the internal spam filter has classified the message. Previously, filtering was always done before spam classification, which meant that you could not use any results of the spam classification in a filter.

The default spam processing that is available without using filters (whitelisting, move or delete messages with a sufficiently high threshold) should be sufficient for most users. But for people with special requirements you can now implement those requirements in a filter with customizations. Let me give examples in this posting.

Using the “after classification” filters

Proper care and feeding of spam really needs to classify messages in three ways. Some messages can be easily detected as spam, and should never be looked at. Others are clearly ham, and should be treated as real. But those in the middle need some handling, which may be either training, or perhaps examined weekly to make sure no false positives are there. Default Mozilla mailnews (which is the generic term for features that are available in any of the applications created from this codebase, including Thunderbird and SeaMonkey) junk management doesn’t provide any capability to manage these uncertain emails. My JunQuilla extension provides an Uncertain folder which is focused on the training issue, but with the new  filter features you can have more precise control of this. (Currently you can’t install JunQuilla in SeaMonkey, but I will fix that eventually).

First, let’s see what is new and how it can be enabled.

Create a new filter by selecting Tools/Message Filters … then New. Open up the search attributes menu, and you’ll see something like this:

NoJunkOptions

No junk options! To get those, you’ll need to first select one of the “after classification” contexts from “Apply Filter When”. Then you’ll see something like this:

WithJunkOptions

If “Checking mail (after classification)” is disabled, then you probably are trying to set an after-classification filter on a POP3 account that is actually sending its email to another location (the so-called deferred-to server). You need instead to set the “after classification” filter on the “deferred-to” server, which is typically Local Folders.

Let me explain each of these search attributes.

Junk Percent is the score returned from the bayes filter when classifying the message, with 100 being the most likely to be junk, and 0 the least likely. The default setting in Thunderbird classifies a message as junk when this score is 90 or greater. It is sort of a probablility, but not really because too many false assumptions are made in the Naive Bayesian Classifier for this to really be a probability. Just treat it like a score. Unfortunately default installs of Thunderbird and SeaMonkey do not provide you with any way to see the value of this on typical messages. JunQuilla though provides a custom column that shows this on each message so you can get a feel for what typical values are.

Junk Status is pretty simple, it either Is or Isnt Junk. In the normal case where the internal bayes filter is used to classify the message, this means it had a junk percent of greater than 90.

Junk Score Origin shows you who classified the message. Its values are:

Plugin: the bayes filter.

User: you manually classified this message as junk or good (not useful in an incoming filter, but maybe in a manual filter or search).

Filter: a previous filter action set the junk status.

Whitelist: the spam processing decided this message was good because it was from someone in your address book.

IMAP Flag: this message was classified by another system, so we know it is junk or good, but don’t know why the other system classified it this way. You might see this if you access mail from more than one computer.

Default Thunderbird does not support any way to see the junk score origin on individual messages, though JunQuilla provides a Junk Status + column which uses different icons for each junk score origin.

Classifying messages as uncertain

So let’s design a filter that will move messages to an Uncertain folder if we want to examine them, but not have them clutter the inbox. That’s pretty easy, we’ll just move messages with a junk percent in a certain range to that folder:

UncertainJunk

The order of message processing in Mozilla mailnews is:

  1. Run normal filters (on each message as it is received)
  2. Check whitelisting (on a message batch, this and subsequent steps)
  3. Run bayes classifier on non-whitelisted messages, and mark messages as junk or good.
  4. Apply “after classification” filters.
  5. Apply junk message moves using default junk processing.

So at least in theory, you can apply the “after classification” filter to the Uncertain messages, and still let the default junk processing move junk messages to a Junk folder. (Testing of this is welcomed!)

Weak Whitelisting

As a more complex example, spammers are starting to send out emails that have spoofed From addresses that match the domain of your email, figuring that there is a chance that you have these other addresses whitelisted, so you’ll get the spam. To fight this, we’ll setup a filter that does a whitelist that is slightly weaker than the usual all-or-nothing whitelist on those easily spoofed addresses. Because whitelisting occurs before spam processing, and no score will be assigned if the message is whitelisted, you will need to disable the default whitelisting functionality, and rely entirely on message filters for this to work.

We’ll add the following search terms, all of which must match to apply our weak whitelist:

  1. From address appears in an address book (this is normal whitelisting)
  2. My domain appears in the address (because that is easily spoofed)
  3. Junk Score Origin is Plugin (this prevents the filter from running on messages that we classified, in case we run it manually on existing folders).
  4. Junk Status is Junk (we’ll only whitelist if the bayes filter thought it was junk. I only do this so that I can see that the filter decided to override the decision of the bayes processor, which needs the Junk Status + column from JunQuilla to see.)
  5. Junk Percent is less than 95 (since the bayes filter only marked messages as junk with the percent > 90, this means that we will override the bayes decision for messages between 90 and 95 in score).

Putting this all together, you get a filter that looks like this:

WeakWhitelist

You would also need to define a filter that is applied after this one, that whitelists any messages that meet the normal whitelist criteria, but were marked as junk by the bayes filter.

I’m not necessarily recommending this filter, it was meant as a demonstration. But I hope you can see that the new ability to use the bayes filter in combination with other message criteria in a filter provides lots of new possibilities for more precise handling of possible spam messages.

Extension status for Thunderbird 3.0 beta 3 (and Seamonkey 2.0 beta 1)

July 19, 2009 – 10:21 pm

I’ve just bumped the allowable Thunderbird (and where applicable Seamonkey) version on my extensions to allow them to work with the new version numbers in the nightlies and release candidates. I had hoped to have new releases available by now, but have not yet done that.

Major changes are planned for FiltaQuilla (new custom searches) and ToneQuilla (switching to ogg Vorbis as the standard format). JunQuilla should have bug fixes, TaQuilla and GlodaQuilla will be virtually unchanged.

Filtering changes for Thunderbird 3.0 beta 3

July 8, 2009 – 10:37 am

It’s been a long time since I posted a blog, being busy with things I wanted to get into Thunderbird 3.0 beta 3 (and Seamonkey 2.0 beta 1). Now that we enter the dark days of the freeze prior to the release, I have some time to update extensions to use new features available in beta 3. But I’d like to give details first of changes in the backend areas where I am working, starting with email filtering in this post.

So here are things that are new in Thunderbird 3.0 beta 3 (Seamonkey 2.0 beta 1) that involve message filtering:

1. Imap filtering on folders that are not the inbox.

If you have a server filter that moves IMAP messages to a folder other than the inbox, previously you could not use TB filters on it. Now you can, controlled by an inherited folder property. There is no user interface for this feature at the moment, but I will add some to FiltaQuilla. See bug 257415.

2. “From, To, Cc, or Bcc” filter/search term

This is a new search term that can be used in advanced search or filters. It does two things: combines all of the normal address fields into a single term, plus adds support for Bcc for the first time.

The support for Bcc needs some comments. Just in case it isn’t clear, Bcc information is not added to outgoing emails that you send (or the incoming emails that you receive), so the only place it really shows up is in emails that you sent, and have kept your own copy of. So the Bcc term does nothing for emails that you receive, only emails in your sent folder. Also, we currently do not have any mechanism to automatically apply filters to messages that you send, so really the main place Bcc will show up in filter/search is in Advanced search, saved searches (virtual folders), or in manual filters (filter-after-the-fact).

3. Junk fields in manual filters

We have now enabled the junk-related search terms of “Junk Status”, “Junk Percent”, and “Junk Status Origin” in manual filters.

There is a subtle change to the behavior of search term editing that was implemented that affected this. Previously, if you caused the filter editor to select an invalid search term (typically by changing the filter context away from “Manual”), that was ignored, and you could save the invalid filter. Now, the invalid terms are grayed out, and attempts to save the filter will result in an error dialog. This behavior was important to allow adding of more complex filtering options.

4. Custom Search Terms

The biggest change that affects filters is the ability for extensions to add custom search terms. This is a complement to the addition of custom filter actions, which were available in beta 2, and are the main point of FiltaQuilla. I expect to do a blog post soon detailing the code necessary for an extension to add a custom search term, plus I will add some custom search terms to FiltaQuilla (probably a regex search on a header).

See bug 495519 for details, including a demonstration extension that adds custom terms.

5. Fix of message copy/move bug.

There has been a long-standing bug involving failed moves or copies, that typically occurred when there were automatic copies or moves of emails, such as are done in filters or junk processing. Bug 497622 hopefully solves this, which should improve the reliability of moves and copies.

What did’t make beta 3, but should be in TB3.0

There are two big changes that have mostly working patches, that did not make beta 3 but hopefully will be available in beta 4.

1. IMAP body filters.

IMAP messages are now downloaded by default, so in theory it should be easy to filter on their bodies. This is bug 127250 (or possibly the closely related bug 67421). This bug is really critical in combination with the new custom search attributes, as it would be very valuable to write filters that analyze each message body, and then take a particular action when something is found there.

2. Post-bayes incoming filter context

Currently, it is not possible to filter on any of the message characteristics that are determined by the bayes filter (including junk status, and the closely related custom traits) automatically while a message is incoming. This is because the bayes analysis, and its related filtering actions, are applied after the standard filters. A post-bayes filter context will allow the user to specify that a particular filter should be applied after the bayes analysis, not before, and then we will be able to enable the junk-related search terms (and an extension could add a filter term that used a custom bayes trait). See bug 198100.

Do MEALS need a fork?

March 23, 2009 – 4:12 pm

While doing various kinds of marketing research around Mozilla development, I’ve noticed a disturbing trend, which is probably well-known to most of you: Mozillians Earning A Living Somehow (MEALS) often seem to resort to code forks. In the mailnews area, we have Spicebird and Postbox. I’m less familiar with the browser area, but Flock is a similar example. This post from lilmatt describes some of the issues for Flock, also discussed by Daniel Glazman. I was particularly intrigued by lilmatt’s comment:

“If, as an example, Flock were to be implemented as an extension and attempted to say, overwrite the affiliate tags for the search box in the chrome with it’s own to redirect revenue, I think they’d be vilified and perhaps even blocked”

This reminds me of an issue that affects microenterprise loans in developing countries, as pioneered by the Grameen Bank. Sustainability is the holy grail of any microenterprise fund – but the interest rates that must be charged for sustainability are shocking to many people, usually near 40% per year.

Sometimes a well-meaning organization will go into an area, and setup a non-sustainable microenterpise program that charges “reasonable” rates, say 10% per year, and is understanding and forgiving when their poor borrowers can’t repay the loans. That program operates for awhile, but inevitably fails. After that happens, the microenterprise community has learned that it is impossible for sustainable microenterprise operations to function in the same region, because they are “vilified” for charging “exorbitant” rates for their loans. Or the borrowers have come to believe, due to the forgiving nature of the well-intentioned organization, that loans are really gifts. This cultural pollution that was inflicted on the region by that well-intentioned organization persists for an entire generation, during which time no real, sustainable loan program can be established, and the local region is deprived of the benefits of microenterprise loan programs. (OK, this is a simplification, but I hope you get my drift.)

I’m afraid we have this kind of cultural pollution in FOSS . As I look at vertical markets, often there are virtually identical applications in both the open-source and commercial spaces. The cultural expectation is that the addon for the FOSS application (like Thunderbird) is free-as-in-beer, but people are perfectly accepting of $10 – $40 charges in the Apple or Microsoft space. You get the sense that some people would be horrified if a Thunderbird addon carried a charge.

But what that means is that the Thunderbird addons are virtually impossible to make sustainable as businesses. So MEALS (Mozzilians Earning A Living Somehow) resort to forks, and we lose the benefit of their future efforts. In the fork, they can try to absorb the revenue streams that Mozilla relies on, or reset the cultural environment to be more in alignment with a commercial model.

What a pity. If we could figure out a way to all work together, then the net effect would be so much more powerful. If there was money to be made, we’d have a much more powerful product to present – and the total pie to share would be so much bigger. Yet I fear that the cultural pollution has already occurred, and any attempt to change the mindset will just fail.

KNIFE anyone? (Kent Nicely Introduces Free Extensions, thereby stabbing himself and the rest of you in the back. But please try my extensions!)

rkent

P. S. As an attempt to put my money where my mouth is, I paid $1.50 at istockphoto for the image above. The process took me at least an hour to figure out, after reading all of the legal language I feel like I must be a criminal, and I’m still not sure that I didn’t do something wrong that won’t cost me my entire retirement savings in liability. There has got to be a friendlier way to get micropayments!