Forum

Share

Please consider registering
guest

Log In RegisterMembers
Or log in with

Register | Lost password?
Advanced Search:

— Forum Scope —



— Match —



— Forum Options —




Wildcard usage:
*  matches any number of characters    %  matches exactly one character

Minimum search word length is 4 characters - maximum search word length is 84 characters

Topic RSS
Whole word matching ?
May 12, 2010
5:53 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

I use FiltaQuilla regex match to match whole words in subjects, with regex such as:

/(^|[^A-Za-z])myword([^A-Za-z]|$)/i

 

This is very useful to match short acronyms in subjects without matching common words which would happen to include these acronyms (e.g. avoid matching "NATO" in "denominator"), and without missing these acronyms when they appear at the start or end of a subject (which is the issue if you match on " XXX ", with spaces around the acronym).

 

But, since this regex syntax is quite heavy, unreadable and error prone, I wanted to suggest that it would be nice to add such type of whole word matching in subject as a matching criteria in itself. It could be named "Subject: whole word", with options such as "is contained", "is not contained", and followed by a field where the user would just have to enter the word to match.

 

Thanks if you can consider this addition !

 

Share
May 12, 2010
1:34 pm
Admin
Forum Posts: 323
Member Since:
July 12, 2008
Offline

I'll add this to my list of possible additions.

But given your other post, how will that work with non-asciii characters?

Share
May 18, 2010
1:54 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

Hmmm.

In fact, I looked into a Javascript regexp tutorial, and there exists a shortcut to match word boundaries ( b ) ; and after some testing, it seems it properly consider special accented chars as part of the word, and start and end of subject as word boundaries.

 

Said differently, a better regexp as the one above is :

/bmywordb/i

 

Anyways, to make it more easily discoverable, it is still possibly worth being added as a specific criteria…?

 

 

Share
May 18, 2010
2:08 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

While thinking more about it, I'm not sure my test was conclusive: b may not be taking accented chars into account as part of a word.

The easiest way I see is to use a positive match against word boundary chars, instead of a negative match against chars that can be part of a word.

The better way to make such a regex would then be something like:

 

(^|[-_,;.:*=|s])myword([-_,;.:*=|s]|$)

 

(s matches any space character)

Share
May 18, 2010
6:48 am
Admin
Forum Posts: 323
Member Since:
July 12, 2008
Offline

That will be a fairly easy addition to FiltaQuilla. Thanks for suggesting it.

You've also got me thinking that it might be good to have "named" javascript custom filters. That is, since you can easily define the correct javascript for your filter, it should be easy to add a named javascript custom filter of your own that has that test in it, and could then be added like any other filter. I'd like to consider supporting that concept as well.

Share
May 18, 2010
8:11 am
Member
Forum Posts: 7
Member Since:
May 12, 2010
Offline

About the regex above:

- there must be a backslash before the s (it was stripped by the forum engine)

- the list of punctuation chars should be extended to include more chars (eg. parenthesis…)

- I think it is better to not include "-" as part of this regex (for instance to allow "pro-XMPP" to match the "XMPP" word)

 

And I agree with you that being able to define customer filters would be really great !

If you implement this, you could extend it to allow custom actions too.

 

(the next step is to build an appstore for custom filters and actions and target world domination)

Share
Forum Timezone: UTC -8

Most Users Ever Online: 18

Currently Online:
8 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

bobkatz: 8

BigMike: 8

t2m: 7

zabolyx: 7

taa: 6

onlyme: 6

Member Stats:

Guest Posters: 130

Members: 565

Moderators: 1

Admins: 1

Forum Stats:

Groups: 1

Forums: 7

Topics: 231

Posts: 802

Moderators: rkent (323)

Administrators: rkent (323)