TaQuilla provides automatic “soft” tags for messages

Today I posted TaQuilla on the experimental area of AMO here. TaQuilla extends the tagging features of Mozilla mailnews products (Thunderbird and Seamonkey) so that tags are applied automatically using the same bayesian filter technology used for junk mail processing. TaQuilla requires Thunderbird 3.0 beta 2, which was released today.

Bayesian filters need training, and this is provided in the background for you as you tag or untag messages. Once you “prime the pump” by tagging some messages, and untagging a few others, then it just works. There are also some diagnostic displays available, that show the message tokens that were used to arrive at a message tagging classification, or will show in the message pane the percent match of messages to a soft tag.

I’ve been testing this using a tag “Personal” that I apply to emails that are not associated with my business activities. I have a variety of personal interests, and a variety of business interests, so trying to devise a message filter to separate them would be quite challenging. But TaQuilla is quite effective at separating the two. Then I have a “Personal” and a “Not Personal” virtual folder, so that I can view messages of the appropriate nature depending on whether I am working or not.

To give you some idea how it works, here are tokens used to classify a few messages that I recently received. The first was a weekly general mailing from my church’s pastor, classified as Personal. Here’s the analysis:

TaQuilla Analysis of a church email

TaQuilla Analysis of a church email

This is the detail view of a message from TaQuilla. The “Token” column shows words from the message that participated in the analysis. “Token %” is the probability that a message containing that token should be tagged as Personal. The “Running %” column is the total percent match of the message, as a running total starting with the strongest tokens.

The 99% at the bottom of the “Running %” column shows a very high match to Personal. Words like “pastor”, “worship”, “Sunday” strongly matched Personal, so it was automatically tagged Personal.

A second example email was a posting on the thunderbird-testing list, that was not tagged as personal. Its analysis is:

TaQuilla analysis of a Thunderbird-related email

TaQuilla analysis of a Thunderbird-related email

Here you have words like “Feedback”, “bug”, and “Thunderbird” which are strongly not-Personal.

And finally, we have a strange one – a message from my son asking a question about spam processing. What did Taquilla do with this?

Taquilla analysis of a mixed email

Taquilla analysis of a mixed email

Well it could not decide. The percent match was 51%, which was marked as Personal because the cutoff is 50%, but it’s pretty clear that this is a confused email.

At present, soft tagging only supports email, and it is either on or off for the entire application. By the time Thunderbird 3 beta 3 is available, I expect to have enough hooks in the backend code to also support RSS feeds and News, as well as provide more granularity on which accounts and folders have soft tags applied. I’m hoping that I can use soft tagging in large-volume lists such as Planet Mozilla, to automatically flag posts that are “Interesting”. (I’ll incidently provide junk management for RSS and news in the same patches.)

Comments are closed.