Forum

Or log in with
 
Current User: Guest Login Register
Please consider registering


Register? | Lost Your Password?

Whole word matching ?

Reply to Post Add a New Topic
UserPost

5:53 am
May 12, 2010


t2m

Member

posts 7

I use FiltaQuilla regex match to match whole words in subjects, with regex such as:

/(^|[^A-Za-z])myword([^A-Za-z]|$)/i

 

This is very useful to match short acronyms in subjects without matching common words which would happen to include these acronyms (e.g. avoid matching "NATO" in "denominator"), and without missing these acronyms when they appear at the start or end of a subject (which is the issue if you match on " XXX ", with spaces around the acronym).

 

But, since this regex syntax is quite heavy, unreadable and error prone, I wanted to suggest that it would be nice to add such type of whole word matching in subject as a matching criteria in itself. It could be named "Subject: whole word", with options such as "is contained", "is not contained", and followed by a field where the user would just have to enter the word to match.

 

Thanks if you can consider this addition !

 

1:34 pm
May 12, 2010


rkent

Admin

posts 279

I'll add this to my list of possible additions.

But given your other post, how will that work with non-asciii characters?

1:54 am
May 18, 2010


t2m

Member

posts 7

Hmmm.

In fact, I looked into a Javascript regexp tutorial, and there exists a shortcut to match word boundaries ( b ) ; and after some testing, it seems it properly consider special accented chars as part of the word, and start and end of subject as word boundaries.

 

Said differently, a better regexp as the one above is :

/bmywordb/i

 

Anyways, to make it more easily discoverable, it is still possibly worth being added as a specific criteria…?

 

 

2:08 am
May 18, 2010


t2m

Member

posts 7

While thinking more about it, I'm not sure my test was conclusive: b may not be taking accented chars into account as part of a word.

The easiest way I see is to use a positive match against word boundary chars, instead of a negative match against chars that can be part of a word.

The better way to make such a regex would then be something like:

 

(^|[-_,;.:*=|s])myword([-_,;.:*=|s]|$)

 

(s matches any space character)

6:48 am
May 18, 2010


rkent

Admin

posts 279

Post edited 8:36 am – May 18, 2010 by rkent


That will be a fairly easy addition to FiltaQuilla. Thanks for suggesting it.

You've also got me thinking that it might be good to have "named" javascript custom filters. That is, since you can easily define the correct javascript for your filter, it should be easy to add a named javascript custom filter of your own that has that test in it, and could then be added like any other filter. I'd like to consider supporting that concept as well.

8:11 am
May 18, 2010


t2m

Member

posts 7

About the regex above:

- there must be a backslash before the s (it was stripped by the forum engine)

- the list of punctuation chars should be extended to include more chars (eg. parenthesis…)

- I think it is better to not include "-" as part of this regex (for instance to allow "pro-XMPP" to match the "XMPP" word)

 

And I agree with you that being able to define customer filters would be really great !

If you implement this, you could extend it to allow custom actions too.

 

(the next step is to build an appstore for custom filters and actions and target world domination)

Reply to Post

Reply to Topic:
Whole word matching ?

Guest Name (Required):

Guest Email (Required):

NOTE: New Posts are subject to administrator approval before being displayed

Smileys
Confused Cool Cry Embarassed Frown Kiss Laugh Smile Surprised Wink Yell
Post New Reply

Guest URL (required)

Math Required!
What is the sum of:
9 + 1
   


 
Share