Issue 3547 in pharo: Have a look at the new match: implementation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue 3547 in pharo: Have a look at the new match: implementation

pharo
Status: Accepted
Owner: [hidden email]
Labels: Milestone-1.3

New issue 3547 by [hidden email]: Have a look at the new match:  
implementation
http://code.google.com/p/pharo/issues/detail?id=3547

Collections-ul.420

     Hi Squeakers,

     Latest trunk version broke the Aida/web and Herbert managed to find the
     reason in new behavior of String>>match: which doesn't allow two '**' in
     pattern anymore. In Aida the matching patterns are usually composed out
     of many strings and as it happens two stars occurs in Aida's default
     installation already.

     Because this new match behavior (not allowing two '**') is new only in
     Squeak but allowed in all other Smalltalks, I propose to reconsider this
     decision and better retract back to old behavior.


answer:
------

The old behavior was definitely wrong. Here are two examples with the old  
behavior from the corresponding issue  
(http://bugs.squeak.org/view.php?id=6841 ):

'*#' match: 'e'. "false"
'**' match: 'e'. "false"

If you look at the previous version of the method from 2004 (I don't know  
if this is the original or not), then you'll see that it tried to raise an  
error if the string contained '**', but it failed to do so because of this  
bug. So the intended behavior - to disallow ** in the pattern - is not new  
at all.

IMHO the best solution to this problem is to rewrite the method to allow  
any combination of * and # in the pattern.


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3547 in pharo: Have a look at the new match: implementation

pharo

Comment #1 on issue 3547 by [hidden email]: Have a look at the new  
match: implementation
http://code.google.com/p/pharo/issues/detail?id=3547

A new version of Collections was added to project The Inbox:
http://source.squeak.org/inbox/Collections-ul.420.mcz

==================== Summary ====================

Name: Collections-ul.420
Author: ul
Time: 15 January 2011, 1:57:14.662 pm
UUID: aadc4c69-dd68-4f46-8370-dda4404f0153
Ancestors: Collections-mtf.419

- do not raise an error in String >> #match: if the receiver contains '**'  
or '*#' as a substring.
- fixed the comment in String >> #match:

=============== Diff against Collections-mtf.419 ===============

Item was changed:
  ----- Method: String>>match: (in category 'comparing') -----
  match: text
        "Answer whether text matches the pattern in this string.
        Matching ignores upper/lower case differences.
        Where this string contains #, text may contain any character.
        Where this string contains *, text may contain any sequence of  
characters."

        ^ self startingAt: 1 match: text startingAt: 1
  "
        '*'                     match: 'zort' true
        '*baz'          match: 'mobaz' true
        '*baz'          match: 'mobazo' false
        '*baz*'         match: 'mobazo' true
        '*baz*'         match: 'mozo' false
        'foo*'          match: 'foozo' true
        'foo*'          match: 'bozo' false
        'foo*baz'       match: 'foo23baz' true
        'foo*baz'       match: 'foobaz' true
        'foo*baz'       match: 'foo23bazo' false
        'foo'           match: 'Foo' true
        'foo*baz*zort' match: 'foobazort' false
+       'foo*baz*zort' match: 'foobazzort' true
-       'foo*baz*zort' match: 'foobazzort' false
        '*foo#zort'     match: 'afoo3zortthenfoo3zort' true
        '*foo*zort'     match: 'afoodezortorfoo3zort' true
  "!

Item was changed:
  ----- Method: String>>startingAt:match:startingAt: (in  
category 'comparing') -----
  startingAt: keyStart match: text startingAt: textStart
        "Answer whether text matches the pattern in this string.
        Matching ignores upper/lower case differences.
        Where this string contains #, text may contain any character.
        Where this string contains *, text may contain any sequence of  
characters."
+
        | anyMatch matchStart matchEnd i matchStr j ii jj |
        i := keyStart.
        j := textStart.
+
+       "Process consecutive *s and #s at the beginning."
+       anyMatch := false.
+       [ i <= self size and: [
+               (self at: i)
+                       caseOf: {
+                               [ $* ] -> [
+                                       anyMatch := true.
+                                       i := i + 1.
+                                       true ].
+                               [ $# ] -> [
+                                       i := i + 1.
+                                       j := j + 1.
+                                       true ] }
+                       otherwise: [ false ] ] ] whileTrue.
+       i > self size ifTrue: [
+               ^j - 1 = text size or: [ "We reached the end by matching  
the character with a #."
+                       anyMatch and: [ j <= text size ] "Or there was a *  
before the end." ] ].
+       matchStart := i.

-       "Check for any #'s"
-       [i > self size ifTrue: [^ j > text size "Empty key matches only  
empty string"].
-       (self at: i) = $#] whileTrue:
-               ["# consumes one char of key and one char of text"
-               j > text size ifTrue: [^ false "no more text"].
-               i := i+1.  j := j+1].
-
-       "Then check for *"
-       (self at: i) = $*
-               ifTrue: [i = self size ifTrue:
-                                       [^ true "Terminal * matches all"].
-                               "* means next match string can occur  
anywhere"
-                               anyMatch := true.
-                               matchStart := i + 1]
-               ifFalse: ["Otherwise match string must occur immediately"
-                               anyMatch := false.
-                               matchStart := i].
-
        "Now determine the match string"
        matchEnd := self size.
+       (ii := self indexOf: $* startingAt: matchStart) > 0 ifTrue: [  
matchEnd := ii-1 ].
+       (ii := self indexOf: $# startingAt: matchStart) > 0 ifTrue: [  
matchEnd := matchEnd min: ii-1 ].
-       (ii := self indexOf: $* startingAt: matchStart) > 0 ifTrue:
-               [ii = matchStart ifTrue: [self error: '** not valid -- use  
* instead'].
-               matchEnd := ii-1].
-       (ii := self indexOf: $# startingAt: matchStart) > 0 ifTrue:
-               [ii = matchStart ifTrue: [self error: '*# not valid -- use  
#* instead'].
-               matchEnd := matchEnd min: ii-1].
        matchStr := self copyFrom: matchStart to: matchEnd.

        "Now look for the match string"
        [jj := text findString: matchStr startingAt: j caseSensitive: false.
        anyMatch ifTrue: [jj > 0] ifFalse: [jj = j]]
                whileTrue:
                ["Found matchStr at jj.  See if the rest matches..."
                (self startingAt: matchEnd+1 match: text startingAt: jj +  
matchStr size) ifTrue:
                        [^ true "the rest matches -- success"].
                "The rest did not match."
                anyMatch ifFalse: [^ false].
                "Preceded by * -- try for a later match"
                j := j+1].
        ^ false "Failed to find the match string"!



Reply | Threaded
Open this post in threaded view
|

Re: Issue 3547 in pharo: Have a look at the new match: implementation

pharo
Updates:
        Labels: Type-Squeak Difficulty-Easy

Comment #2 on issue 3547 by [hidden email]: Have a look at the new  
match: implementation
http://code.google.com/p/pharo/issues/detail?id=3547

(No comment was entered for this change.)


Reply | Threaded
Open this post in threaded view
|

Re: Issue 3547 in pharo: Have a look at the new match: implementation

pharo

Comment #3 on issue 3547 by [hidden email]: Have a look at the new  
match: implementation
http://code.google.com/p/pharo/issues/detail?id=3547

ame: CollectionsTests-ul.179
Author: ul
Time: 15 January 2011, 1:58:09.591 pm
UUID: eda65a29-e17b-634b-b99c-e6f862302701
Ancestors: CollectionsTests-nice.178

- updated StringTest >> #testMatch to reflect the changes of  
Collections-ul.420

=============== Diff against CollectionsTests-nice.178 ===============

Item was changed:
  ----- Method: StringTest>>testMatch (in category 'test-comparing') -----
  testMatch

+        
#('**' 'f**'  'f**o' 'f*' '*f*' 'f#*' 'f##' '*oo' '#oo' '*o*' '#o#' '#o*' '*o#' 'fo*' 'fo#' '*foo*' '###' '*#'  'f*#' 'f*#o')
-       #('**' '*#' 'f**' 'f*#' 'f**o' 'f*#o') do: [ :each |
-               self should: [ each match: 'foo' ] raise: Error ].
-        
#('f*' '*f*' 'f#*' 'f##' '*oo' '#oo' '*o*' '#o#' '#o*' '*o#' 'fo*' 'fo#' '*foo*' '###')
                do: [ :each | self assert: (each match: 'foo') ].
        #('bar' 'foo#' '#foo' '*foo#' '#foo*' '*bar*') do: [ :each |
                self deny: (each match: 'foo') ]!