I wondered if you considered using MaxEnt instead of what I presume is Naive Bayes?
Although this would probably make training a lot more time-intensive, it could be done in the background (as most users' resource usage is minimal) and in batches when sufficient number of new e-mails has been classified.
Then features like number of links in an e-mail, average length of sentences, etc. could be incorporated, which would probably improve the result.
Otherwise thumbs up for this. I almost never use beta's, but this pretty much compels me to go for it.
Recent Comments