Data Persistence (Mailnews Exchange Support)

By | June 25, 2010

My project to provide Exchange Web Services (EWS) support to applications based on the Mozilla mailnews codebase entered a new phase this week, where I am starting to consider the issue of local persistence of data downloaded from the server. (In the previous week, I got two other things working: display of HTML emails, and updating of UNREAD status from the local app to the server).

EWS messages do not come from the server in RFC-822 format, so it seems like a pity to store them that way, though that is the common method used in the rest of the mailnews codebase. Instead, I decided to implement a local storage scheme based on SQLite and Mozilla’s Storage interface. Andrew Sutherland has done a lot of great work setting up an environment similar to this for the gloda database, so there are lots of good examples to pull from. Also, because the datamodel for EWS includes not only messages but also Calendar and Contact items, I can have a common database infrastucture that I can leverage over those other pieces once I get it working for the messaging part.

I’ve now replaced my previous in-memory datastore for message metadata with an SQLite version. This is equivalent to the datastore module in gloda, and the data it is storing is like the RFC-822 headers. I still have to do the storage for the body, and also hook this up with folder change state so that the code knows that it has data it can trust.

As I have done this, I’ve had a new set of insights into the relationship of the various objects in the Mozilla mailnews world (which I sometimes call Skink). Previously, I had sort of expected that the natural progression of gloda would be to slowly displace the role of the message summary database, nsIMsgDBHdr.  But now I see that a more natural progression would be for SQLite to be used as a replacement for the local mailstore (currently mbox, with maildir support moving forward as well.) Really the main issue is the async nature of the SQLite calls, which sort of precludes its easy use as a replacement for nsIMsgDBHdr. But the datastores are typically accessed async anyway. If the message metadata in the message stores was stored primarily in SQLite format, as I will be doing, then it would be much easier to hookup an SQLite-based global search facility to all of these databases. Yes that is what gloda does now, but it has to go through all of the work to maintain a separate version of everything. Why have three copies of everything (Mork, MBox, and Gloda) when you could only have two (Mork and Gloda)?

As another insight, while looking through the gloda code I noticed that a JSON object was being saved to store some of the items. I though that was a good idea at first – but then I tried to write a simple serializer to convert from my internal native format to JSON objects, and saw that it was not going to be an easy project. But then I remembered that SOAP is really just a mechanism to serialize typed objects, and I already have a SOAP encoder and decoder! So instead of using JSON, I use objects serialized with my SOAP XML encoder to store unindexed items in my SQLite store. So a message (sans body) ends up looking like this as a TEXT item in SQLite:

<Message xmlns="http://schemas.microsoft.com/exchange/services/2006/types">
 <Subject>Postini First Junk Email Safely Quarantined</Subject>
 <DateTimeReceived>2010-06-04T22:19:30Z</DateTimeReceived>
 <Size>2612</Size>
 <Importance>Normal</Importance>
 <DisplayTo>Kent James</DisplayTo>
 <Culture>en-US</Culture>
 <Sender>
  <Mailbox>
   <Name>Postini Support</Name>
   <EmailAddress>noreply@hostedmsexchange.com</EmailAddress>
   <RoutingType>SMTP</RoutingType>
  </Mailbox>
 </Sender>
 <ToRecipients>
  <Mailbox>
   <Name>Kent James</Name>
   <EmailAddress>rkentjames@caspia.org</EmailAddress>
   <RoutingType>SMTP</RoutingType>
  </Mailbox>
 </ToRecipients>
 <From>
  <Mailbox>
   <Name>Postini Support</Name>
   <EmailAddress>noreply@hostedmsexchange.com</EmailAddress>
   <RoutingType>SMTP</RoutingType>
  </Mailbox>
 </From>
 <InternetMessageId>&lt;0c34b5a4-5f3c-4654-bf9d-99c9a8cb439b@HUB02.4emm.local&gt;
 </InternetMessageId>
 <IsRead>1</IsRead>
</Message>

At first it bothered me to save what is essentially a duplicate of what is coming over the wire, but why not? It’s not conceptually any different than RFC-822, or JSON, in function.