ByteString>>match: greedyness of * ??

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ByteString>>match: greedyness of * ??

Ch Lamprecht
Hello,

I found the following results for some expressions using #match:

'e' match: 'e'.   "true"
'*' match: 'e'.   "true"
'#' match: 'e'.   "true"

'*e' match: 'e'.  "true"
'*#' match: 'e'.  "false"
'**' match: 'e'.  "false"

'*' match: ''.    "true"
'**' match: ''.   "false"


Is this expected behavior?
Looks like * is sometimes 'greedy', sometimes not. (Comparing 4 and 5)
Thank you for any hints.

Christoph
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: ByteString>>match: greedyness of * ??

Nicolas Cellier-3
This behavior is squeakish, other Smalltalk match differently:

VW:  '**' match: 'e'. "true"
gst: '**' match: 'e'. "true"

Anyway, this pattern matching is limited. How do you match a '*' itself?
I thought your example might be interpreted as an escape sequence, but
no, there is no escape in this simple matching.

'**' match: '*'. "false"
'\*' match: '*'. "false"

Try VBregex or another regex package.

Nicolas


Ch Lamprecht a écrit :

> Hello,
>
> I found the following results for some expressions using #match:
>
> 'e' match: 'e'.   "true"
> '*' match: 'e'.   "true"
> '#' match: 'e'.   "true"
>
> '*e' match: 'e'.  "true"
> '*#' match: 'e'.  "false"
> '**' match: 'e'.  "false"
>
> '*' match: ''.    "true"
> '**' match: ''.   "false"
>
>
> Is this expected behavior?
> Looks like * is sometimes 'greedy', sometimes not. (Comparing 4 and 5)
> Thank you for any hints.
>
> Christoph

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: ByteString>>match: greedyness of * ??

Ch Lamprecht
nicolas cellier wrote:

> This behavior is squeakish, other Smalltalk match differently:
>
> VW:  '**' match: 'e'. "true"
> gst: '**' match: 'e'. "true"
>
> Anyway, this pattern matching is limited. How do you match a '*' itself?
> I thought your example might be interpreted as an escape sequence, but
> no, there is no escape in this simple matching.
>
> '**' match: '*'. "false"
> '\*' match: '*'. "false"
>
> Try VBregex or another regex package.
>
> Nicolas

Hi,
thank you.
In addition to the expressions below, I found, that #match: does not behave as
stated by the comment given in the method definition itself:

 From ByteString>>match:

"
        [snip]
        'foo*baz' match: 'foo23baz' true
        'foo*baz' match: 'foobaz' true    <----
        'foo*baz' match: 'foo23bazo' false
        'foo' match: 'Foo' true
        'foo*baz*zort' match: 'foobazort' false
        'foo*baz*zort' match: 'foobazzort' false <----
        [snip]
"

confused, Christoph

>
>
> Ch Lamprecht a écrit :
>
>> Hello,
>>
>> I found the following results for some expressions using #match:
>>
>> 'e' match: 'e'.   "true"
>> '*' match: 'e'.   "true"
>> '#' match: 'e'.   "true"
>>
>> '*e' match: 'e'.  "true"
>> '*#' match: 'e'.  "false"
>> '**' match: 'e'.  "false"
>>
>> '*' match: ''.    "true"
>> '**' match: ''.   "false"
>>
>>
>> Is this expected behavior?
>> Looks like * is sometimes 'greedy', sometimes not. (Comparing 4 and 5)
>> Thank you for any hints.
>>
>> Christoph
>
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners

_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: ByteString>>match: greedyness of * ??

Kent Loobey
On Tuesday 08 January 2008 13:03:22 Ch Lamprecht wrote:

> nicolas cellier wrote:
> > This behavior is squeakish, other Smalltalk match differently:
> >
> > VW:  '**' match: 'e'. "true"
> > gst: '**' match: 'e'. "true"
> >
> > Anyway, this pattern matching is limited. How do you match a '*' itself?
> > I thought your example might be interpreted as an escape sequence, but
> > no, there is no escape in this simple matching.
> >
> > '**' match: '*'. "false"
> > '\*' match: '*'. "false"
> >
> > Try VBregex or another regex package.
> >
> > Nicolas
>
> Hi,
> thank you.
> In addition to the expressions below, I found, that #match: does not behave
> as stated by the comment given in the method definition itself:
>
>  From ByteString>>match:
>
> "
> [snip]
> 'foo*baz' match: 'foo23baz' true
> 'foo*baz' match: 'foobaz' true    <----
> 'foo*baz' match: 'foo23bazo' false
> 'foo' match: 'Foo' true
> 'foo*baz*zort' match: 'foobazort' false
> 'foo*baz*zort' match: 'foobazzort' false <----
> [snip]
> "

In general * means any character including no characters.

So the first one is foo any-character baz.

The third is false because of the "o" on the end.  If you wanted it to work
you could put 'foo*baz*'.

I don't know why the last one wasn't reported as true.

>
> confused, Christoph
>
> > Ch Lamprecht a écrit :
> >> Hello,
> >>
> >> I found the following results for some expressions using #match:
> >>
> >> 'e' match: 'e'.   "true"
> >> '*' match: 'e'.   "true"
> >> '#' match: 'e'.   "true"
> >>
> >> '*e' match: 'e'.  "true"
> >> '*#' match: 'e'.  "false"
> >> '**' match: 'e'.  "false"
> >>
> >> '*' match: ''.    "true"
> >> '**' match: ''.   "false"
> >>
> >>
> >> Is this expected behavior?
> >> Looks like * is sometimes 'greedy', sometimes not. (Comparing 4 and 5)
> >> Thank you for any hints.
> >>
> >> Christoph
> >
> > _______________________________________________
> > Beginners mailing list
> > [hidden email]
> > http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
> _______________________________________________
> Beginners mailing list
> [hidden email]
> http://lists.squeakfoundation.org/mailman/listinfo/beginners


_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners
Reply | Threaded
Open this post in threaded view
|

Re: Re: ByteString>>match: greedyness of * ??

Ch Lamprecht
Hello,

I browsed String>>startingAt:match:startingAt:

and changed two lines to make errorhandling work as probably intended by the
author.

startingAt: keyStart match: text startingAt: textStart
        "Answer whether text matches the pattern in this string.
        Matching ignores upper/lower case differences.
        Where this string contains #, text may contain any character.
        Where this string contains *, text may contain any sequence of characters."
        | anyMatch matchStart matchEnd i matchStr j ii jj |
        i := keyStart.
        j := textStart.

        "Check for any #'s"
        [i > self size ifTrue: [^ j > text size "Empty key matches only empty string"].
        (self at: i) = $#] whileTrue:
                ["# consumes one char of key and one char of text"
                j > text size ifTrue: [^ false "no more text"].
                i := i+1.  j := j+1].

        "Then check for *"
        (self at: i) = $*
                ifTrue: [i = self size ifTrue:
                                        [^ true "Terminal * matches all"].
                                "* means next match string can occur anywhere"
                                anyMatch := true.
                                matchStart := i + 1]
                ifFalse: ["Otherwise match string must occur immediately"
                                anyMatch := false.
                                matchStart := i].

        "Now determine the match string"
        matchEnd := self size.
        (ii := self indexOf: $* startingAt: matchStart) > 0 ifTrue:


"changed the following line to:"
                [ii = matchStart  ifTrue: [self error: '** not valid -- use * instead'].
                matchEnd := ii-1].
        (ii := self indexOf: $# startingAt: matchStart) > 0 ifTrue:


"changed the following line to:"
                [ii = matchStart  ifTrue: [self error: '*# not valid -- use #* instead'].
                matchEnd := matchEnd min: ii-1].
        matchStr := self copyFrom: matchStart to: matchEnd.

        "Now look for the match string"
        [jj := text findString: matchStr startingAt: j caseSensitive: false.
        anyMatch ifTrue: [jj > 0] ifFalse: [jj = j]]
                whileTrue:
                ["Found matchStr at jj.  See if the rest matches..."
                (self startingAt: matchEnd+1 match: text startingAt: jj + matchStr size) ifTrue:
                        [^ true "the rest matches -- success"].
                "The rest did not match."
                anyMatch ifFalse: [^ false].
                "Preceded by * -- try for a later match"
                j := j+1].
        ^ false "Failed to find the match string"



Kent Loobey wrote:

> On Tuesday 08 January 2008 13:03:22 Ch Lamprecht wrote:
>
>>nicolas cellier wrote:
>>
>>>This behavior is squeakish, other Smalltalk match differently:
>>>
>>>VW:  '**' match: 'e'. "true"
>>>gst: '**' match: 'e'. "true"
>>>
>>>Anyway, this pattern matching is limited. How do you match a '*' itself?
>>>I thought your example might be interpreted as an escape sequence, but
>>>no, there is no escape in this simple matching.
>>>
>>>'**' match: '*'. "false"
>>>'\*' match: '*'. "false"
>>>
>>>Try VBregex or another regex package.
>>>
>>>Nicolas
>>
>>Hi,
>>thank you.
>>In addition to the expressions below, I found, that #match: does not behave
>>as stated by the comment given in the method definition itself:
>>
>> From ByteString>>match:
>>
>>"
>> [snip]
>> 'foo*baz' match: 'foo23baz' true
>> 'foo*baz' match: 'foobaz' true    <----
>> 'foo*baz' match: 'foo23bazo' false
>> 'foo' match: 'Foo' true
>> 'foo*baz*zort' match: 'foobazort' false
>> 'foo*baz*zort' match: 'foobazzort' false <----
>> [snip]
>>"
>
>
> In general * means any character including no characters.
>
> So the first one is foo any-character baz.
>
> The third is false because of the "o" on the end.  If you wanted it to work
> you could put 'foo*baz*'.
>
> I don't know why the last one wasn't reported as true.
>
>
>>confused, Christoph
>>
>>
>>>Ch Lamprecht a écrit :
>>>
>>>>Hello,
>>>>
>>>>I found the following results for some expressions using #match:
>>>>
>>>>'e' match: 'e'.   "true"
>>>>'*' match: 'e'.   "true"
>>>>'#' match: 'e'.   "true"
>>>>
>>>>'*e' match: 'e'.  "true"
>>>>'*#' match: 'e'.  "false"
>>>>'**' match: 'e'.  "false"
>>>>
>>>>'*' match: ''.    "true"
>>>>'**' match: ''.   "false"
>>>>
>>>>
>>>>Is this expected behavior?
>>>>Looks like * is sometimes 'greedy', sometimes not. (Comparing 4 and 5)
>>>>Thank you for any hints.
_______________________________________________
Beginners mailing list
[hidden email]
http://lists.squeakfoundation.org/mailman/listinfo/beginners