Wildcards in text searches

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Wildcards in text searches

Ian Bartholomew-18
Andy/Blair,

Answering Jurkos post reminded me of a problem that I had noticed
before but never chased.  When you pop up the dialog for "browse methods
containing text" it informs you that wildcards are allowed.  This
doesn't work in most cases as the
String>>matchPatternFrom:in:from:ignoreCase: method: only answers true
if the _whole_ of the wildcarded pattern and source are compared.  So
....

'b#d' match: 'abcdef' ==> false
'd#f' match: 'abcdef' ==> false
'ab#d#f' match: 'abcdef' ==> true

'a*e' match: 'abcdef' ==> false
'a*f' match: 'abcdef' ==> true

It would appear that String>>match: is not really suitable for wildcard
searches on embedded text

--
Ian

Use the Reply-To address to contact me.
Mail sent to the From address is ignored.


Reply | Threaded
Open this post in threaded view
|

Re: Wildcards in text searches

Chris Uppal-3
Ian Bartholomew wrote:

> It would appear that String>>match: is not really suitable for wildcard
> searches on embedded text

It assumes that the pattern is anchored at both ends.  One fix is to modify
SmalltalkSystem>>sourceFilterFor:

============
sourceFilterFor: aString
 "Private - Answer a <monadicValuable> that can be used to select from a
collection of
 methods those which contain the specified source string or pattern."

 ^(aString includesAnyOf: '*?#')
  ifTrue:
   [| pattern |
   pattern := aString copyReplacing: $? withObject: $#.
   pattern := (pattern beginsWith: '*') ifFalse: [pattern :=  '*' , pattern].
   pattern := (pattern endsWith: '*') ifFalse: [pattern :=  pattern , '*'].
   [:each |
   | src |
   (src := each getSource) notNil and: [pattern match: src]]]
  ifFalse: [[:each | each containsSource: aString]]
============

The explicit tests for a a leading/trailing '*' are necessary, I think, since
the patten matching algorithm is O(n^m) where m is the number of '*'s in the
pattern.

(If anyone's interested, I do have an implementation of wildcard matching that
is backtracking free, and so does not suffer from pathological behaviour for
"complicated" patterns.  It's more complicated and doesn't usually make much
difference in practice, though.)

    -- chris


Reply | Threaded
Open this post in threaded view
|

Re: Wildcards in text searches

Ian Bartholomew-18
Chris,

> It assumes that the pattern is anchored at both ends.  One fix is to
> modify SmalltalkSystem>>sourceFilterFor:
[]

Well spotted.  I had a look at modifying the #match:* methods but it
looked a bit dangerous.  I didn't think of moving back a step and
adjusting the search sting before it got that far.

--
Ian

Use the Reply-To address to contact me.
Mail sent to the From address is ignored.


Reply | Threaded
Open this post in threaded view
|

Re: Wildcards in text searches

Carsten Haerle
In reply to this post by Ian Bartholomew-18
Why don't you just append a "*" and the end of the pattern?
Sometimes it is important that the pattern matches the whole text and by
appending "*" you can always have the functionality you need. If the method
would be changed, you cannot match the whole string any more.

Regards

Carsten

"Ian Bartholomew" <[hidden email]> schrieb im Newsbeitrag
news:c0g86i$169t4q$[hidden email]...

> Andy/Blair,
>
> Answering Jurkos post reminded me of a problem that I had noticed
> before but never chased.  When you pop up the dialog for "browse methods
> containing text" it informs you that wildcards are allowed.  This
> doesn't work in most cases as the
> String>>matchPatternFrom:in:from:ignoreCase: method: only answers true
> if the _whole_ of the wildcarded pattern and source are compared.  So
> ....
>
> 'b#d' match: 'abcdef' ==> false
> 'd#f' match: 'abcdef' ==> false
> 'ab#d#f' match: 'abcdef' ==> true
>
> 'a*e' match: 'abcdef' ==> false
> 'a*f' match: 'abcdef' ==> true
>
> It would appear that String>>match: is not really suitable for wildcard
> searches on embedded text
>
> --
> Ian
>
> Use the Reply-To address to contact me.
> Mail sent to the From address is ignored.
>
>