Forum

Share

Please consider registering
guest

Log In RegisterMembers
Or log in with

Register | Lost password?
Advanced Search:

— Forum Scope —



— Match —



— Forum Options —




Wildcard usage:
*  matches any number of characters    %  matches exactly one character

Minimum search word length is 4 characters - maximum search word length is 84 characters

Topic RSS
Subject Regex and non-ASCII chars
May 12, 2010
5:43 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

I have been using FiltaQuilla with great pleasure, Subject regex matching is wonderful. Thanks a lot!

 

However… I have a simple regex which intended to match a french word with a non-ASCII char:  /raté/i

… but this does not seem to work at all. :-(

 

Replacing the "é" by "…" seem to work in some cases, e.g. for emails where the Subject header is encoded in ISO-8859-1 as "Subject: =?iso-8859-1?Q?rat=E9?=".  But this workaround does not work for UTF-8 encoded subjects (e.g. "Subject: =?UTF-8?B?cmF0w6k=?=").

By contrast using Thunderbird basic string matching for Subject does work as expected for subjects encoded in these ways.

 

Could it be that FiltaQuilla works on the non-decoded strings ?

If so, that would be great to fix !

(and I guess it applies to To, From and Cc headers too)

 

Thanks

Share
May 12, 2010
1:30 pm
Admin
Forum Posts: 323
Member Since:
July 12, 2008
Offline

I'm going to have to check a little more to make sure if the "subject" variable I am using has been decoded or not. I thought it had, but looking at some other code I am starting to doubt that.

Anyway I'll add this as a bug to investigate, hopefully before the next release.

Share
May 18, 2010
1:50 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

Ok, don't hesitate to ping me if you want me to test !

Thanks you.

Share
July 2, 2010
5:57 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

After digging a bit mozilla developper documentation, I found that :

( from

https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIMsgDBHdr )

subject

string
Indicates

the subject of this message; the equivalent header is the

Subject: header. The value here will effectively be the unparsed header

content, so it will contain full MIME-encoded syntax.

….    
     

mime2DecodedAuthor

AString
Readonly:

mime2DecodedSubject

AString
Readonly:

mime2DecodedRecipients

AString
Readonly:

 

Knowning this, I cooked the following patch, which fixes the subject matching issue for me:

 

— filtaquilla.js.orig    2010-07-02 15:34:42.000000000 +0200

+++ filtaquilla.js    2010-07-02 15:37:57.000000000 +0200

@@ -834,7 +834,7 @@

       },

       match: function subjectRegEx_match(aMsgHdr, aSearchValue, aSearchOp)

       {

-        var subject = aMsgHdr.subject;

+        var subject = aMsgHdr.mime2DecodedSubject;

         let searchValue;

         let searchFlags;

        [searchValue, searchFlags] = _getRegEx(aSearchValue);

 

I can now use accents in my subject matching regexps.

 

Can you consider including this patch in a later revision ?

 

Note that you may also want to generalise this fix in other parts of the code, such as in :

if (/@SUBJECT@/.test(parameter))

-      return parameter.replace(/@SUBJECT@/, hdr.subject);

+      return parameter.replace(/@SUBJECT@/, hdr.mime2DecodedSubject);

and same for authors and recipients headers.

 

Share
July 2, 2010
6:40 am
Admin
Forum Posts: 323
Member Since:
July 12, 2008
Offline

I'll be happy to include that in a future release. Thanks for investigating this!

Share
Forum Timezone: UTC -8

Most Users Ever Online: 18

Currently Online:
8 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

bobkatz: 8

BigMike: 8

t2m: 7

zabolyx: 7

taa: 6

onlyme: 6

Member Stats:

Guest Posters: 130

Members: 565

Moderators: 1

Admins: 1

Forum Stats:

Groups: 1

Forums: 7

Topics: 231

Posts: 802

Moderators: rkent (323)

Administrators: rkent (323)