My project to provide Exchange Web Services (EWS) support to applications based on the Mozilla mailnews codebase entered a new phase this week, where I am starting to consider the issue of local persistence of data downloaded from the server. (In the previous week, I got two other things working: display of HTML emails, and updating of UNREAD status from the local app to the server).
EWS messages do not come from the server in RFC-822 format, so it seems like a pity to store them that way, though that is the common method used in the rest of the mailnews codebase. Instead, I decided to implement a local storage scheme based on SQLite and Mozilla’s Storage interface. Andrew Sutherland has done a lot of great work setting up an environment similar to this for the gloda database, so there are lots of good examples to pull from. Also, because the datamodel for EWS includes not only messages but also Calendar and Contact items, I can have a common database infrastucture that I can leverage over those other pieces once I get it working for the messaging part.
I’ve now replaced my previous in-memory datastore for message metadata with an SQLite version. This is equivalent to the datastore module in gloda, and the data it is storing is like the RFC-822 headers. I still have to do the storage for the body, and also hook this up with folder change state so that the code knows that it has data it can trust.
As I have done this, I’ve had a new set of insights into the relationship of the various objects in the Mozilla mailnews world (which I sometimes call Skink). Previously, I had sort of expected that the natural progression of gloda would be to slowly displace the role of the message summary database, nsIMsgDBHdr. But now I see that a more natural progression would be for SQLite to be used as a replacement for the local mailstore (currently mbox, with maildir support moving forward as well.) Really the main issue is the async nature of the SQLite calls, which sort of precludes its easy use as a replacement for nsIMsgDBHdr. But the datastores are typically accessed async anyway. If the message metadata in the message stores was stored primarily in SQLite format, as I will be doing, then it would be much easier to hookup an SQLite-based global search facility to all of these databases. Yes that is what gloda does now, but it has to go through all of the work to maintain a separate version of everything. Why have three copies of everything (Mork, MBox, and Gloda) when you could only have two (Mork and Gloda)?
As another insight, while looking through the gloda code I noticed that a JSON object was being saved to store some of the items. I though that was a good idea at first – but then I tried to write a simple serializer to convert from my internal native format to JSON objects, and saw that it was not going to be an easy project. But then I remembered that SOAP is really just a mechanism to serialize typed objects, and I already have a SOAP encoder and decoder! So instead of using JSON, I use objects serialized with my SOAP XML encoder to store unindexed items in my SQLite store. So a message (sans body) ends up looking like this as a TEXT item in SQLite:
<Message xmlns="http://schemas.microsoft.com/exchange/services/2006/types"> <Subject>Postini First Junk Email Safely Quarantined</Subject> <DateTimeReceived>2010-06-04T22:19:30Z</DateTimeReceived> <Size>2612</Size> <Importance>Normal</Importance> <DisplayTo>Kent James</DisplayTo> <Culture>en-US</Culture> <Sender> <Mailbox> <Name>Postini Support</Name> <EmailAddress>noreply@hostedmsexchange.com</EmailAddress> <RoutingType>SMTP</RoutingType> </Mailbox> </Sender> <ToRecipients> <Mailbox> <Name>Kent James</Name> <EmailAddress>rkentjames@caspia.org</EmailAddress> <RoutingType>SMTP</RoutingType> </Mailbox> </ToRecipients> <From> <Mailbox> <Name>Postini Support</Name> <EmailAddress>noreply@hostedmsexchange.com</EmailAddress> <RoutingType>SMTP</RoutingType> </Mailbox> </From> <InternetMessageId><0c34b5a4-5f3c-4654-bf9d-99c9a8cb439b@HUB02.4emm.local> </InternetMessageId> <IsRead>1</IsRead> </Message>
At first it bothered me to save what is essentially a duplicate of what is coming over the wire, but why not? It’s not conceptually any different than RFC-822, or JSON, in function.

Cool!
You can use SQLite synchronously via mozStorage if you want. It’s only a serious problem when you have potentially long-running operations. As long as you stay away from using the full-text search mechanism or complex queries you should be sufficiently fine.
Note that I’m not suggesting that it’s a good idea long term, but if your goal is to work with nsMsgDBView and existing message reader and friends, it’s a better idea than the convolutions that would be required to try and make them work with async.
Vision-wise, my plan is for new account/message types to just exist at the gloda level and never touch the ‘skink’ level, as you so dub it. The work to support that is under way and not proven yet, so nothing is set in stone. Functionality developed at the skink level should still be able to be lofted to the gloda level as will continue to be the case for POP/IMAP for at least the medium term.
“You can use SQLite synchronously via mozStorage if you want”
In my implementation, I’m allowing sync grabbing of single records from a simple indexed key, but not of the queries (like “get a list of all item ids that need updating”). I was going to ask you if you thought that was OK, since the mozStorage MDC article “strongly discourages” that, it sounds like you think it is.
“account/message types to just exist at the gloda level”
That must mean that you are tring to get the standard display logic (folder and message panes) to work with some sort of message abstraction other than nsIMsgDBHdr objects. It would be great if you could blog about that.
In terms of synchronous stuff…
Yeah, efficient single record lookups should generally result in acceptable performance as long as the header cache is large. The trick is that if you ever want to do more expensive queries, you have to be very careful not to do things in such a way that you block your main-thread synchronous reader.
Probably the safest way to accomplish that would be to use a combination of 1) using a read-only second connection for long running reads when the user is non-idle and 2) only ever performing writes asynchronously on your ‘main’ connection and when the user is idle.
The rationale is that multiple connections can be open and performing concurrent read-only operations without interfering with each other. A single connection can be used on multiple threads, but a mutex means that only one thread can be in a sqlite3_step at a time, which means that a complex query on the async thread (or any other thread) could block the main thread with a complex query, even if it is read-only.
From a write perspective, you definitely do not want to be performing writes on the main thread because you will clog up the event loop. It’s potentially inadvisable to perform writes from a different connection than your main one just because the file locking makes all reads impossible until the transaction has completed (which again will likely result in blocking the main thread), whereas if you’re writing from your main connection then you have a chance to get a read operation in from the main thread even while stuff is happening on the other thread. (Although you are then in danger of getting inconsistent reads.)
So, to summarize, doing any synchronous access from the main thread is not really a great idea, but if you’re working with the existing UI framework, you don’t have much of a choice.
Thanks for the response. Based on that, I’ll definately want to avoid sync in my implementation of native object persistence, since I am not forced down that path by a lot of existing code.
It’s worth noting that the SQLite 3.7.0 development series has a new log mechanism that will allow multiple reads to happen without delay even while a writer is active. The problem is that its release may not be imminent and it’s pretty much guaranteed that will never land on 1.9.2. Even 1.9.3 might be a stretch depending on when 3.7 is released and how willing Firefox is to take a chance on it / whether Firefox will gain an immediate benefit.
Which is just to say, if you’re not planning to ship against Thunderbird 3.1, don’t write off SQLite changing the game.
In terms of displaying non-nsIMsgDBHdr stuff, you are right that I want to display non-nsIMsgDBHdr stuff. But it won’t be done using the current folder, thread pane, or message reader implementations. It is a goal to enable familiar-looking displays along the same lines with similar (or better) performance characteristics, but the current code implementation has certain deeply-rooted limitations that are not suitable for other performance and presentation goals.
It’s the kind of thing I am reluctant to blog about until I am further along in the implementation process. At the current stage, it is a speculative and contentious plan whose most likely immediate outcome would be at least a low-level flame war.
Which is to say, it would likely make a good number of people unhappy and concerned about where Thunderbird is (or at least I am) headed without any evidence to reassure them or help them change their minds. The people who would be enthused by the direction would likely be happy, but things are not yet at a stage where anyone can really contribute meaningfully. Also, I feel like I’ve already expressed the gist of my gameplan to those would be enthused. For example, if you found this strategy on my part wholly surprising, I would, in turn, be surprised.
It’s very important to note that this is not going to be a short-term kind of effort and the 3-pane view as it currently exists is certainly not going away anytime I can see. In fact, bienvenu’s thinking is much more aligned with your own than mine, and he’s pretty bad-ass, so we could end up in a situation where the 3-pane and ‘skink’ in general evolve to the next level and continue to exist much as they do today.
I don’t really understand your reluctance to blog for fear of a flame war. I’ve said before, and I will repeat again, that Skink develpment is sorely lacking vision at the moment IMHO, at least publically (and I am part of “the public” here who does not see the vision.) Spirited discussions are part of the process of developing vision. Skink is much more likely to die a slow death for lack of vision, than to die because of excessive controversy over that vision. Take a risk. Expect some negative responses, and try to learn from those responses.
My concern about the flame war is secondary. My primary concern is that I am a firm believer in “show me the code” and telling people about things when they can use them or meaningfully contribute to them. That moment is several weeks out.
Which is not to say it will be a “surprise! here’s a fully formed implementation” kind of thing, just that it will be “here’s a coherent set of ideas, some of which are already working and all suggest the proposed future steps are not insane, and here’s what you can do to provide insight/feedback/hack on things.”
I should probably also point out that I’m expecting to see you, bienvenu, jcranmer, and other contributors in a week in-person at the mozilla summit. I want to leverage that high-bandwidth reduced-chance-for-misunderstandings environment. If that wasn’t coming up, I would be much more likely to solicit public feedback. (I’m actually working on different stuff (UI framework) between now and then, so there’s little lost opportunity.)
ÿþ|