Find after in strings?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Find after in strings?

Tim Mackinnon
Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.

I find it rather boring to try and parse a string, after a known marker - thus:
(loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].

Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?

Couldn’t we have: findAfter: aString ifAbsent: aBlock ?

Or is there a whole better pattern for string searching that I’m missing ?

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Richard O'Keefe
If you want to move around in strings, you might want to use a ReadStream.
In some Smalltalk systems there is a method called #skipToAll:.
Here's how mine starts out:

    skipToAll: aSequence
      "If the remaining elements can be parsed as <a><aSequence><b>,
       return true leaving the position where?  Otherwise return false.      
       GNU Smalltalk, Dolphin, and VisualAge:
         leave the position after <aSequence>, consistent with #upToAll:,
         and implementable in this class.  (GST uses the KMP algorithm.)
       VisualWorks and ST/X:
         leave the position before <aSequence>, inconsistent with #upToAll:,
         and only implementable for positionable streams.      
       Squeak 5.2 and Pharo 6.0:
         not provided.
       I cannot be compatible with everything.  The semantics of
       #skipToAll: should obviously match #upToAll:, so I'll fit
       in with GNU, Dolphin, and VisualAge Smalltalk.
      "

So
    (myReadStream skipToAll: 'marker')
      ifTrue:  [loc := myReadStream position]
      ifFalse: [alternative code].

HOWEVER, I have a bad feeling about this.  The entire approach, as with much
concerning strings in a Unicode age, seems fraught with peril.  Consider
input = 'the need for vigilance is never-ending'
marker = 'end'
Should the marker be found or not?
input = '.... Si<floating acute accent> ...'
marker = 'Si'
Should the marker be found or not?
My code is NOT sensitive to these issues.
I would like to say that it was because I was writing a compatibility
method, so my code was compatibly broken,
but to be honest, I was stupid and forgot to think it through.
I would think that there would need to be an '... asTokens: aBoolean'
variant that checks that a match
 - is not followed by floating diacriticals
 - is not preceded by an alphanumeric if the target begins with one
 - is not followed by an alphanumeric if the target ends with one.

My own preference is to write a lexical analyser for the mini-language
I'm using, and NOT try to hack at it using general-purpose string
methods.

Perhaps you can tell us more about the context?  What is the application-
level task you are trying to solve?



On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.

I find it rather boring to try and parse a string, after a known marker - thus:
(loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].

Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?

Couldn’t we have: findAfter: aString ifAbsent: aBlock ?

Or is there a whole better pattern for string searching that I’m missing ?

Tim
Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Ben Coman
In reply to this post by Tim Mackinnon
On Sat, 1 Jun 2019 at 18:01, Tim Mackinnon <[hidden email]> wrote:

>
> Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.
>
> I find it rather boring to try and parse a string, after a known marker - thus:
> (loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].
>
> Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?
>
> Couldn’t we have: findAfter: aString ifAbsent: aBlock ?
>
> Or is there a whole better pattern for string searching that I’m missing ?

Can your input be split into tokens using Stirng>>findTokens:
and `ReadStream on: that`.  Then you marker will be consumed by a single #next.

cheers -ben

Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Richard O'Keefe
I forgot to mention that in order to be able to translate XPath to Smalltalk
I added these methods to my Smalltalk:
AbstractSequence>>
  afterSubCollection: aSequence [ifAbsent: exceptionBlock]
    "based on substring-after"
  beforeSubCollection: aSequence [ifAbsent: exceptionBlock]
    "based on substring-before"

I'm not going to display the code, which is pretty simple, because there is a
serious efficiency problem with these.  (The Java equivalent, had there been
one, would not have had this problem, but since I think Java 7 it would too.)
Imagine that we have a string consisting of N instances of separators, and
we want to split it into pieces.
  [s includesSubCollection: separator] whileTrue: [
     self process: (s beforeSubCollection: separator).
     s := s afterSubCollection: separator].
  s isEmpty ifFalse: [
     self process: s].
Do I need to explain why this takes O(N**2) time?
However,
  r := ReadStream on: s.
  [r atEnd] whileFalse: [
     self process: (r upToAll: separator)].
is linear time (assuming an efficient #upToAll:).

My previous remarks about boundary issues apply here as well, of course.
My previous remarks about wanting to see the context do too.


On Sun, 2 Jun 2019 at 00:43, Ben Coman <[hidden email]> wrote:
On Sat, 1 Jun 2019 at 18:01, Tim Mackinnon <[hidden email]> wrote:
>
> Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.
>
> I find it rather boring to try and parse a string, after a known marker - thus:
> (loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].
>
> Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?
>
> Couldn’t we have: findAfter: aString ifAbsent: aBlock ?
>
> Or is there a whole better pattern for string searching that I’m missing ?

Can your input be split into tokens using Stirng>>findTokens:
and `ReadStream on: that`.  Then you marker will be consumed by a single #next.

cheers -ben

Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Tim Mackinnon
In reply to this post by Richard O'Keefe
Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).

I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.

Tim


On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:

If you want to move around in strings, you might want to use a ReadStream.
In some Smalltalk systems there is a method called #skipToAll:.
Here's how mine starts out:

    skipToAll: aSequence
      "If the remaining elements can be parsed as <a><aSequence><b>,
       return true leaving the position where?  Otherwise return false.      
       GNU Smalltalk, Dolphin, and VisualAge:
         leave the position after <aSequence>, consistent with #upToAll:,
         and implementable in this class.  (GST uses the KMP algorithm.)
       VisualWorks and ST/X:
         leave the position before <aSequence>, inconsistent with #upToAll:,
         and only implementable for positionable streams.      
       Squeak 5.2 and Pharo 6.0:
         not provided.
       I cannot be compatible with everything.  The semantics of
       #skipToAll: should obviously match #upToAll:, so I'll fit
       in with GNU, Dolphin, and VisualAge Smalltalk.
      "

So
    (myReadStream skipToAll: 'marker')
      ifTrue:  [loc := myReadStream position]
      ifFalse: [alternative code].

HOWEVER, I have a bad feeling about this.  The entire approach, as with much
concerning strings in a Unicode age, seems fraught with peril.  Consider
input = 'the need for vigilance is never-ending'
marker = 'end'
Should the marker be found or not?
input = '.... Si<floating acute accent> ...'
marker = 'Si'
Should the marker be found or not?
My code is NOT sensitive to these issues.
I would like to say that it was because I was writing a compatibility
method, so my code was compatibly broken,
but to be honest, I was stupid and forgot to think it through.
I would think that there would need to be an '... asTokens: aBoolean'
variant that checks that a match
 - is not followed by floating diacriticals
 - is not preceded by an alphanumeric if the target begins with one
 - is not followed by an alphanumeric if the target ends with one.

My own preference is to write a lexical analyser for the mini-language
I'm using, and NOT try to hack at it using general-purpose string
methods.

Perhaps you can tell us more about the context?  What is the application-
level task you are trying to solve?



On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.

I find it rather boring to try and parse a string, after a known marker - thus:
(loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].

Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?

Couldn’t we have: findAfter: aString ifAbsent: aBlock ?

Or is there a whole better pattern for string searching that I’m missing ?

Tim

Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Tim Mackinnon
Classic - I went to implement #skipUpToAll: and its called #match: in Pharo…. I struggle with some of the naming of string and stream methods in pharo/smalltalk and the protocols are often not quite what I expect either (altough in this case, match is “positioning” which is more in line with my thinking - but searching would be my first choice).

Tim


On 1 Jun 2019, at 20:49, Tim Mackinnon <[hidden email]> wrote:

Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).

I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.

Tim


On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:

If you want to move around in strings, you might want to use a ReadStream.
In some Smalltalk systems there is a method called #skipToAll:.
Here's how mine starts out:

    skipToAll: aSequence
      "If the remaining elements can be parsed as <a><aSequence><b>,
       return true leaving the position where?  Otherwise return false.      
       GNU Smalltalk, Dolphin, and VisualAge:
         leave the position after <aSequence>, consistent with #upToAll:,
         and implementable in this class.  (GST uses the KMP algorithm.)
       VisualWorks and ST/X:
         leave the position before <aSequence>, inconsistent with #upToAll:,
         and only implementable for positionable streams.      
       Squeak 5.2 and Pharo 6.0:
         not provided.
       I cannot be compatible with everything.  The semantics of
       #skipToAll: should obviously match #upToAll:, so I'll fit
       in with GNU, Dolphin, and VisualAge Smalltalk.
      "

So
    (myReadStream skipToAll: 'marker')
      ifTrue:  [loc := myReadStream position]
      ifFalse: [alternative code].

HOWEVER, I have a bad feeling about this.  The entire approach, as with much
concerning strings in a Unicode age, seems fraught with peril.  Consider
input = 'the need for vigilance is never-ending'
marker = 'end'
Should the marker be found or not?
input = '.... Si<floating acute accent> ...'
marker = 'Si'
Should the marker be found or not?
My code is NOT sensitive to these issues.
I would like to say that it was because I was writing a compatibility
method, so my code was compatibly broken,
but to be honest, I was stupid and forgot to think it through.
I would think that there would need to be an '... asTokens: aBoolean'
variant that checks that a match
 - is not followed by floating diacriticals
 - is not preceded by an alphanumeric if the target begins with one
 - is not followed by an alphanumeric if the target ends with one.

My own preference is to write a lexical analyser for the mini-language
I'm using, and NOT try to hack at it using general-purpose string
methods.

Perhaps you can tell us more about the context?  What is the application-
level task you are trying to solve?



On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.

I find it rather boring to try and parse a string, after a known marker - thus:
(loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].

Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?

Couldn’t we have: findAfter: aString ifAbsent: aBlock ?

Or is there a whole better pattern for string searching that I’m missing ?

Tim


Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Richard O'Keefe
In reply to this post by Tim Mackinnon
To get #skipToAll: in Pharo, add this to PositionableStream.

skipToAll: aCollection
   "Set the receiver's to just after the next occcurrence of aCollection
    in the receiver's future values and answer true.  If there is no such


On Sun, 2 Jun 2019 at 07:50, Tim Mackinnon <[hidden email]> wrote:
Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).

I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.

Tim


On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:

If you want to move around in strings, you might want to use a ReadStream.
In some Smalltalk systems there is a method called #skipToAll:.
Here's how mine starts out:

    skipToAll: aSequence
      "If the remaining elements can be parsed as <a><aSequence><b>,
       return true leaving the position where?  Otherwise return false.      
       GNU Smalltalk, Dolphin, and VisualAge:
         leave the position after <aSequence>, consistent with #upToAll:,
         and implementable in this class.  (GST uses the KMP algorithm.)
       VisualWorks and ST/X:
         leave the position before <aSequence>, inconsistent with #upToAll:,
         and only implementable for positionable streams.      
       Squeak 5.2 and Pharo 6.0:
         not provided.
       I cannot be compatible with everything.  The semantics of
       #skipToAll: should obviously match #upToAll:, so I'll fit
       in with GNU, Dolphin, and VisualAge Smalltalk.
      "

So
    (myReadStream skipToAll: 'marker')
      ifTrue:  [loc := myReadStream position]
      ifFalse: [alternative code].

HOWEVER, I have a bad feeling about this.  The entire approach, as with much
concerning strings in a Unicode age, seems fraught with peril.  Consider
input = 'the need for vigilance is never-ending'
marker = 'end'
Should the marker be found or not?
input = '.... Si<floating acute accent> ...'
marker = 'Si'
Should the marker be found or not?
My code is NOT sensitive to these issues.
I would like to say that it was because I was writing a compatibility
method, so my code was compatibly broken,
but to be honest, I was stupid and forgot to think it through.
I would think that there would need to be an '... asTokens: aBoolean'
variant that checks that a match
 - is not followed by floating diacriticals
 - is not preceded by an alphanumeric if the target begins with one
 - is not followed by an alphanumeric if the target ends with one.

My own preference is to write a lexical analyser for the mini-language
I'm using, and NOT try to hack at it using general-purpose string
methods.

Perhaps you can tell us more about the context?  What is the application-
level task you are trying to solve?



On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.

I find it rather boring to try and parse a string, after a known marker - thus:
(loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].

Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?

Couldn’t we have: findAfter: aString ifAbsent: aBlock ?

Or is there a whole better pattern for string searching that I’m missing ?

Tim

Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Richard O'Keefe
skipToAll: aCollection
   "Set the receiver's position to just after the next occurrence of aCollection
    in the receiver's future values and answer true.  If there is no such
    occurrence, answer false.  In either case, left the postion where #upToAll:
    would have left it."
    ^self match: aCollection

Sorry about the incomplete message.
#match: is such a bad name for this operation that the method comment has to
go to some trouble to explain that it is nothing like #match: for Strings.


On Sun, 2 Jun 2019 at 14:29, Richard O'Keefe <[hidden email]> wrote:
To get #skipToAll: in Pharo, add this to PositionableStream.

skipToAll: aCollection
   "Set the receiver's to just after the next occcurrence of aCollection
    in the receiver's future values and answer true.  If there is no such


On Sun, 2 Jun 2019 at 07:50, Tim Mackinnon <[hidden email]> wrote:
Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).

I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.

Tim


On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:

If you want to move around in strings, you might want to use a ReadStream.
In some Smalltalk systems there is a method called #skipToAll:.
Here's how mine starts out:

    skipToAll: aSequence
      "If the remaining elements can be parsed as <a><aSequence><b>,
       return true leaving the position where?  Otherwise return false.      
       GNU Smalltalk, Dolphin, and VisualAge:
         leave the position after <aSequence>, consistent with #upToAll:,
         and implementable in this class.  (GST uses the KMP algorithm.)
       VisualWorks and ST/X:
         leave the position before <aSequence>, inconsistent with #upToAll:,
         and only implementable for positionable streams.      
       Squeak 5.2 and Pharo 6.0:
         not provided.
       I cannot be compatible with everything.  The semantics of
       #skipToAll: should obviously match #upToAll:, so I'll fit
       in with GNU, Dolphin, and VisualAge Smalltalk.
      "

So
    (myReadStream skipToAll: 'marker')
      ifTrue:  [loc := myReadStream position]
      ifFalse: [alternative code].

HOWEVER, I have a bad feeling about this.  The entire approach, as with much
concerning strings in a Unicode age, seems fraught with peril.  Consider
input = 'the need for vigilance is never-ending'
marker = 'end'
Should the marker be found or not?
input = '.... Si<floating acute accent> ...'
marker = 'Si'
Should the marker be found or not?
My code is NOT sensitive to these issues.
I would like to say that it was because I was writing a compatibility
method, so my code was compatibly broken,
but to be honest, I was stupid and forgot to think it through.
I would think that there would need to be an '... asTokens: aBoolean'
variant that checks that a match
 - is not followed by floating diacriticals
 - is not preceded by an alphanumeric if the target begins with one
 - is not followed by an alphanumeric if the target ends with one.

My own preference is to write a lexical analyser for the mini-language
I'm using, and NOT try to hack at it using general-purpose string
methods.

Perhaps you can tell us more about the context?  What is the application-
level task you are trying to solve?



On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.

I find it rather boring to try and parse a string, after a known marker - thus:
(loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].

Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?

Couldn’t we have: findAfter: aString ifAbsent: aBlock ?

Or is there a whole better pattern for string searching that I’m missing ?

Tim

Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Sven Van Caekenberghe-2
Why add an alias ? The API is too wide as it is already.

Note that most current implementations of #upToAll: already use words like match, so #match: is not that crazy.

Yes it should be possible to talk about naming, but just adding aliases, no.

My opinion, of course.

> On 2 Jun 2019, at 04:33, Richard O'Keefe <[hidden email]> wrote:
>
> skipToAll: aCollection
>    "Set the receiver's position to just after the next occurrence of aCollection
>     in the receiver's future values and answer true.  If there is no such
>     occurrence, answer false.  In either case, left the postion where #upToAll:
>     would have left it."
>     ^self match: aCollection
>
> Sorry about the incomplete message.
> #match: is such a bad name for this operation that the method comment has to
> go to some trouble to explain that it is nothing like #match: for Strings.
>
>
> On Sun, 2 Jun 2019 at 14:29, Richard O'Keefe <[hidden email]> wrote:
> To get #skipToAll: in Pharo, add this to PositionableStream.
>
> skipToAll: aCollection
>    "Set the receiver's to just after the next occcurrence of aCollection
>     in the receiver's future values and answer true.  If there is no such
>
>
> On Sun, 2 Jun 2019 at 07:50, Tim Mackinnon <[hidden email]> wrote:
> Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).
>
> I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.
>
> Tim
>
>
>> On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:
>>
>> If you want to move around in strings, you might want to use a ReadStream.
>> In some Smalltalk systems there is a method called #skipToAll:.
>> Here's how mine starts out:
>>
>>     skipToAll: aSequence
>>       "If the remaining elements can be parsed as <a><aSequence><b>,
>>        return true leaving the position where?  Otherwise return false.      
>>        GNU Smalltalk, Dolphin, and VisualAge:
>>          leave the position after <aSequence>, consistent with #upToAll:,
>>          and implementable in this class.  (GST uses the KMP algorithm.)
>>        VisualWorks and ST/X:
>>          leave the position before <aSequence>, inconsistent with #upToAll:,
>>          and only implementable for positionable streams.      
>>        Squeak 5.2 and Pharo 6.0:
>>          not provided.
>>        I cannot be compatible with everything.  The semantics of
>>        #skipToAll: should obviously match #upToAll:, so I'll fit
>>        in with GNU, Dolphin, and VisualAge Smalltalk.
>>       "
>>
>> So
>>     (myReadStream skipToAll: 'marker')
>>       ifTrue:  [loc := myReadStream position]
>>       ifFalse: [alternative code].
>>
>> HOWEVER, I have a bad feeling about this.  The entire approach, as with much
>> concerning strings in a Unicode age, seems fraught with peril.  Consider
>> input = 'the need for vigilance is never-ending'
>> marker = 'end'
>> Should the marker be found or not?
>> input = '.... Si<floating acute accent> ...'
>> marker = 'Si'
>> Should the marker be found or not?
>> My code is NOT sensitive to these issues.
>> I would like to say that it was because I was writing a compatibility
>> method, so my code was compatibly broken,
>> but to be honest, I was stupid and forgot to think it through.
>> I would think that there would need to be an '... asTokens: aBoolean'
>> variant that checks that a match
>>  - is not followed by floating diacriticals
>>  - is not preceded by an alphanumeric if the target begins with one
>>  - is not followed by an alphanumeric if the target ends with one.
>>
>> My own preference is to write a lexical analyser for the mini-language
>> I'm using, and NOT try to hack at it using general-purpose string
>> methods.
>>
>> Perhaps you can tell us more about the context?  What is the application-
>> level task you are trying to solve?
>>
>>
>>
>> On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
>> Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.
>>
>> I find it rather boring to try and parse a string, after a known marker - thus:
>> (loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].
>>
>> Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?
>>
>> Couldn’t we have: findAfter: aString ifAbsent: aBlock ?
>>
>> Or is there a whole better pattern for string searching that I’m missing ?
>>
>> Tim
>


Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Richard O'Keefe
The issue is that #skipToAll: is the de facto standard name for the
operation EXCEPT in Squeak and Pharo.  It's not that an alias should
be added.  What should *really* be done is that #match: should be
*renamed* to #skipToAll:.  This will
 - improve compatibility
 - reduce confusion
 - improve navigability

At the moment, for example, it is much harder to discover the
consequences of renaming the #match: method than it should be
because not just people but the system itself confuses #match:
with #match:.

The *real* API bloat in PositionableStream is that it covers
both positionable input streams (which can implement #skipToAll:)
and positionable output streams (which cannot), so that there
are way too many methods in the interface of a WriteStream that
cannot possibly work in any state of the receiver.

With Trait support in Pharo, it is long past time that ReadStreams
(and files opened for input only) did not respondTo: #nextPut: and
that WriteStreams (and files opened for output only) did not
respondTo: #next.

THAT bloat dwarfs a compatibility method.


On Mon, 3 Jun 2019 at 03:04, Sven Van Caekenberghe <[hidden email]> wrote:
Why add an alias ? The API is too wide as it is already.

Note that most current implementations of #upToAll: already use words like match, so #match: is not that crazy.

Yes it should be possible to talk about naming, but just adding aliases, no.

My opinion, of course.

> On 2 Jun 2019, at 04:33, Richard O'Keefe <[hidden email]> wrote:
>
> skipToAll: aCollection
>    "Set the receiver's position to just after the next occurrence of aCollection
>     in the receiver's future values and answer true.  If there is no such
>     occurrence, answer false.  In either case, left the postion where #upToAll:
>     would have left it."
>     ^self match: aCollection
>
> Sorry about the incomplete message.
> #match: is such a bad name for this operation that the method comment has to
> go to some trouble to explain that it is nothing like #match: for Strings.
>
>
> On Sun, 2 Jun 2019 at 14:29, Richard O'Keefe <[hidden email]> wrote:
> To get #skipToAll: in Pharo, add this to PositionableStream.
>
> skipToAll: aCollection
>    "Set the receiver's to just after the next occcurrence of aCollection
>     in the receiver's future values and answer true.  If there is no such
>
>
> On Sun, 2 Jun 2019 at 07:50, Tim Mackinnon <[hidden email]> wrote:
> Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).
>
> I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.
>
> Tim
>
>
>> On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:
>>
>> If you want to move around in strings, you might want to use a ReadStream.
>> In some Smalltalk systems there is a method called #skipToAll:.
>> Here's how mine starts out:
>>
>>     skipToAll: aSequence
>>       "If the remaining elements can be parsed as <a><aSequence><b>,
>>        return true leaving the position where?  Otherwise return false.     
>>        GNU Smalltalk, Dolphin, and VisualAge:
>>          leave the position after <aSequence>, consistent with #upToAll:,
>>          and implementable in this class.  (GST uses the KMP algorithm.)
>>        VisualWorks and ST/X:
>>          leave the position before <aSequence>, inconsistent with #upToAll:,
>>          and only implementable for positionable streams.     
>>        Squeak 5.2 and Pharo 6.0:
>>          not provided.
>>        I cannot be compatible with everything.  The semantics of
>>        #skipToAll: should obviously match #upToAll:, so I'll fit
>>        in with GNU, Dolphin, and VisualAge Smalltalk.
>>       "
>>
>> So
>>     (myReadStream skipToAll: 'marker')
>>       ifTrue:  [loc := myReadStream position]
>>       ifFalse: [alternative code].
>>
>> HOWEVER, I have a bad feeling about this.  The entire approach, as with much
>> concerning strings in a Unicode age, seems fraught with peril.  Consider
>> input = 'the need for vigilance is never-ending'
>> marker = 'end'
>> Should the marker be found or not?
>> input = '.... Si<floating acute accent> ...'
>> marker = 'Si'
>> Should the marker be found or not?
>> My code is NOT sensitive to these issues.
>> I would like to say that it was because I was writing a compatibility
>> method, so my code was compatibly broken,
>> but to be honest, I was stupid and forgot to think it through.
>> I would think that there would need to be an '... asTokens: aBoolean'
>> variant that checks that a match
>>  - is not followed by floating diacriticals
>>  - is not preceded by an alphanumeric if the target begins with one
>>  - is not followed by an alphanumeric if the target ends with one.
>>
>> My own preference is to write a lexical analyser for the mini-language
>> I'm using, and NOT try to hack at it using general-purpose string
>> methods.
>>
>> Perhaps you can tell us more about the context?  What is the application-
>> level task you are trying to solve?
>>
>>
>>
>> On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
>> Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.
>>
>> I find it rather boring to try and parse a string, after a known marker - thus:
>> (loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].
>>
>> Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?
>>
>> Couldn’t we have: findAfter: aString ifAbsent: aBlock ?
>>
>> Or is there a whole better pattern for string searching that I’m missing ?
>>
>> Tim
>


Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Sven Van Caekenberghe-2
To each his own opinion, #match: is not that bad a name, IMHO.

There is much more bloat than the mixing of reading and writing.

The concept of being positionable is bad too: it makes no sense for network and other non-collection backed streams.

There is also all the binary, encoding and converting API that assumes a specific type of element, an assumption that is not always correct.

Yes, Traits could help, but all this is a lot of work.

And I feel like this is a lost battle: everybody keeps on asking to put his/her favourite methods back.

> On 3 Jun 2019, at 00:20, Richard O'Keefe <[hidden email]> wrote:
>
> The issue is that #skipToAll: is the de facto standard name for the
> operation EXCEPT in Squeak and Pharo.  It's not that an alias should
> be added.  What should *really* be done is that #match: should be
> *renamed* to #skipToAll:.  This will
>  - improve compatibility
>  - reduce confusion
>  - improve navigability
>
> At the moment, for example, it is much harder to discover the
> consequences of renaming the #match: method than it should be
> because not just people but the system itself confuses #match:
> with #match:.
>
> The *real* API bloat in PositionableStream is that it covers
> both positionable input streams (which can implement #skipToAll:)
> and positionable output streams (which cannot), so that there
> are way too many methods in the interface of a WriteStream that
> cannot possibly work in any state of the receiver.
>
> With Trait support in Pharo, it is long past time that ReadStreams
> (and files opened for input only) did not respondTo: #nextPut: and
> that WriteStreams (and files opened for output only) did not
> respondTo: #next.
>
> THAT bloat dwarfs a compatibility method.
>
>
> On Mon, 3 Jun 2019 at 03:04, Sven Van Caekenberghe <[hidden email]> wrote:
> Why add an alias ? The API is too wide as it is already.
>
> Note that most current implementations of #upToAll: already use words like match, so #match: is not that crazy.
>
> Yes it should be possible to talk about naming, but just adding aliases, no.
>
> My opinion, of course.
>
> > On 2 Jun 2019, at 04:33, Richard O'Keefe <[hidden email]> wrote:
> >
> > skipToAll: aCollection
> >    "Set the receiver's position to just after the next occurrence of aCollection
> >     in the receiver's future values and answer true.  If there is no such
> >     occurrence, answer false.  In either case, left the postion where #upToAll:
> >     would have left it."
> >     ^self match: aCollection
> >
> > Sorry about the incomplete message.
> > #match: is such a bad name for this operation that the method comment has to
> > go to some trouble to explain that it is nothing like #match: for Strings.
> >
> >
> > On Sun, 2 Jun 2019 at 14:29, Richard O'Keefe <[hidden email]> wrote:
> > To get #skipToAll: in Pharo, add this to PositionableStream.
> >
> > skipToAll: aCollection
> >    "Set the receiver's to just after the next occcurrence of aCollection
> >     in the receiver's future values and answer true.  If there is no such
> >
> >
> > On Sun, 2 Jun 2019 at 07:50, Tim Mackinnon <[hidden email]> wrote:
> > Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).
> >
> > I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.
> >
> > Tim
> >
> >
> >> On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:
> >>
> >> If you want to move around in strings, you might want to use a ReadStream.
> >> In some Smalltalk systems there is a method called #skipToAll:.
> >> Here's how mine starts out:
> >>
> >>     skipToAll: aSequence
> >>       "If the remaining elements can be parsed as <a><aSequence><b>,
> >>        return true leaving the position where?  Otherwise return false.      
> >>        GNU Smalltalk, Dolphin, and VisualAge:
> >>          leave the position after <aSequence>, consistent with #upToAll:,
> >>          and implementable in this class.  (GST uses the KMP algorithm.)
> >>        VisualWorks and ST/X:
> >>          leave the position before <aSequence>, inconsistent with #upToAll:,
> >>          and only implementable for positionable streams.      
> >>        Squeak 5.2 and Pharo 6.0:
> >>          not provided.
> >>        I cannot be compatible with everything.  The semantics of
> >>        #skipToAll: should obviously match #upToAll:, so I'll fit
> >>        in with GNU, Dolphin, and VisualAge Smalltalk.
> >>       "
> >>
> >> So
> >>     (myReadStream skipToAll: 'marker')
> >>       ifTrue:  [loc := myReadStream position]
> >>       ifFalse: [alternative code].
> >>
> >> HOWEVER, I have a bad feeling about this.  The entire approach, as with much
> >> concerning strings in a Unicode age, seems fraught with peril.  Consider
> >> input = 'the need for vigilance is never-ending'
> >> marker = 'end'
> >> Should the marker be found or not?
> >> input = '.... Si<floating acute accent> ...'
> >> marker = 'Si'
> >> Should the marker be found or not?
> >> My code is NOT sensitive to these issues.
> >> I would like to say that it was because I was writing a compatibility
> >> method, so my code was compatibly broken,
> >> but to be honest, I was stupid and forgot to think it through.
> >> I would think that there would need to be an '... asTokens: aBoolean'
> >> variant that checks that a match
> >>  - is not followed by floating diacriticals
> >>  - is not preceded by an alphanumeric if the target begins with one
> >>  - is not followed by an alphanumeric if the target ends with one.
> >>
> >> My own preference is to write a lexical analyser for the mini-language
> >> I'm using, and NOT try to hack at it using general-purpose string
> >> methods.
> >>
> >> Perhaps you can tell us more about the context?  What is the application-
> >> level task you are trying to solve?
> >>
> >>
> >>
> >> On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
> >> Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.
> >>
> >> I find it rather boring to try and parse a string, after a known marker - thus:
> >> (loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].
> >>
> >> Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?
> >>
> >> Couldn’t we have: findAfter: aString ifAbsent: aBlock ?
> >>
> >> Or is there a whole better pattern for string searching that I’m missing ?
> >>
> >> Tim
> >
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Find after in strings?

Tim Mackinnon
Would it be really bad to use the deprecation feature in Pharo to gently migrate to a more common name in stream? There are 101 senders of match: (in my P7 image - presumably some of them my usage), and a lot of them actually referring to string regex match:. Its big but not immense.

#match: normally has the connotation with the String regex matching, not stream skipping, so it might not be too bad (in my mind, and hence why I was a bit caught out - although having some equivalent in a string would be handy too). The deprecation mechanism over time would begin to convert them over from sheer usage wouldn’t it?

How do we decide such things? Is there some proposal mechanism?

Tim

> On 3 Jun 2019, at 15:47, Sven Van Caekenberghe <[hidden email]> wrote:
>
> To each his own opinion, #match: is not that bad a name, IMHO.
>
> There is much more bloat than the mixing of reading and writing.
>
> The concept of being positionable is bad too: it makes no sense for network and other non-collection backed streams.
>
> There is also all the binary, encoding and converting API that assumes a specific type of element, an assumption that is not always correct.
>
> Yes, Traits could help, but all this is a lot of work.
>
> And I feel like this is a lost battle: everybody keeps on asking to put his/her favourite methods back.
>
>> On 3 Jun 2019, at 00:20, Richard O'Keefe <[hidden email]> wrote:
>>
>> The issue is that #skipToAll: is the de facto standard name for the
>> operation EXCEPT in Squeak and Pharo.  It's not that an alias should
>> be added.  What should *really* be done is that #match: should be
>> *renamed* to #skipToAll:.  This will
>> - improve compatibility
>> - reduce confusion
>> - improve navigability
>>
>> At the moment, for example, it is much harder to discover the
>> consequences of renaming the #match: method than it should be
>> because not just people but the system itself confuses #match:
>> with #match:.
>>
>> The *real* API bloat in PositionableStream is that it covers
>> both positionable input streams (which can implement #skipToAll:)
>> and positionable output streams (which cannot), so that there
>> are way too many methods in the interface of a WriteStream that
>> cannot possibly work in any state of the receiver.
>>
>> With Trait support in Pharo, it is long past time that ReadStreams
>> (and files opened for input only) did not respondTo: #nextPut: and
>> that WriteStreams (and files opened for output only) did not
>> respondTo: #next.
>>
>> THAT bloat dwarfs a compatibility method.
>>
>>
>> On Mon, 3 Jun 2019 at 03:04, Sven Van Caekenberghe <[hidden email]> wrote:
>> Why add an alias ? The API is too wide as it is already.
>>
>> Note that most current implementations of #upToAll: already use words like match, so #match: is not that crazy.
>>
>> Yes it should be possible to talk about naming, but just adding aliases, no.
>>
>> My opinion, of course.
>>
>>> On 2 Jun 2019, at 04:33, Richard O'Keefe <[hidden email]> wrote:
>>>
>>> skipToAll: aCollection
>>>   "Set the receiver's position to just after the next occurrence of aCollection
>>>    in the receiver's future values and answer true.  If there is no such
>>>    occurrence, answer false.  In either case, left the postion where #upToAll:
>>>    would have left it."
>>>    ^self match: aCollection
>>>
>>> Sorry about the incomplete message.
>>> #match: is such a bad name for this operation that the method comment has to
>>> go to some trouble to explain that it is nothing like #match: for Strings.
>>>
>>>
>>> On Sun, 2 Jun 2019 at 14:29, Richard O'Keefe <[hidden email]> wrote:
>>> To get #skipToAll: in Pharo, add this to PositionableStream.
>>>
>>> skipToAll: aCollection
>>>   "Set the receiver's to just after the next occcurrence of aCollection
>>>    in the receiver's future values and answer true.  If there is no such
>>>
>>>
>>> On Sun, 2 Jun 2019 at 07:50, Tim Mackinnon <[hidden email]> wrote:
>>> Interesting - there is no #skipToAll: in pharo, I wonder why not? It sounds like what I was looking for - and I’m surprised its not there. There is #skipTo:  for an object (which sounds right, just not the string equivalent).
>>>
>>> I’m not doing anything special, just want to take some lines from the end of a class comment and use them in an exercism exercise - but its not different than many applications - find some tag and use the text after it. I”m kind of surprised its not in Pharo.
>>>
>>> Tim
>>>
>>>
>>>> On 1 Jun 2019, at 12:25, Richard O'Keefe <[hidden email]> wrote:
>>>>
>>>> If you want to move around in strings, you might want to use a ReadStream.
>>>> In some Smalltalk systems there is a method called #skipToAll:.
>>>> Here's how mine starts out:
>>>>
>>>>    skipToAll: aSequence
>>>>      "If the remaining elements can be parsed as <a><aSequence><b>,
>>>>       return true leaving the position where?  Otherwise return false.      
>>>>       GNU Smalltalk, Dolphin, and VisualAge:
>>>>         leave the position after <aSequence>, consistent with #upToAll:,
>>>>         and implementable in this class.  (GST uses the KMP algorithm.)
>>>>       VisualWorks and ST/X:
>>>>         leave the position before <aSequence>, inconsistent with #upToAll:,
>>>>         and only implementable for positionable streams.      
>>>>       Squeak 5.2 and Pharo 6.0:
>>>>         not provided.
>>>>       I cannot be compatible with everything.  The semantics of
>>>>       #skipToAll: should obviously match #upToAll:, so I'll fit
>>>>       in with GNU, Dolphin, and VisualAge Smalltalk.
>>>>      "
>>>>
>>>> So
>>>>    (myReadStream skipToAll: 'marker')
>>>>      ifTrue:  [loc := myReadStream position]
>>>>      ifFalse: [alternative code].
>>>>
>>>> HOWEVER, I have a bad feeling about this.  The entire approach, as with much
>>>> concerning strings in a Unicode age, seems fraught with peril.  Consider
>>>> input = 'the need for vigilance is never-ending'
>>>> marker = 'end'
>>>> Should the marker be found or not?
>>>> input = '.... Si<floating acute accent> ...'
>>>> marker = 'Si'
>>>> Should the marker be found or not?
>>>> My code is NOT sensitive to these issues.
>>>> I would like to say that it was because I was writing a compatibility
>>>> method, so my code was compatibly broken,
>>>> but to be honest, I was stupid and forgot to think it through.
>>>> I would think that there would need to be an '... asTokens: aBoolean'
>>>> variant that checks that a match
>>>> - is not followed by floating diacriticals
>>>> - is not preceded by an alphanumeric if the target begins with one
>>>> - is not followed by an alphanumeric if the target ends with one.
>>>>
>>>> My own preference is to write a lexical analyser for the mini-language
>>>> I'm using, and NOT try to hack at it using general-purpose string
>>>> methods.
>>>>
>>>> Perhaps you can tell us more about the context?  What is the application-
>>>> level task you are trying to solve?
>>>>
>>>>
>>>>
>>>> On Sat, 1 Jun 2019 at 22:01, Tim Mackinnon <[hidden email]> wrote:
>>>> Maybe this is a dumb question - and often I’m surprised when asking these, but why is there no way to “find after” a string.
>>>>
>>>> I find it rather boring to try and parse a string, after a known marker - thus:
>>>> (loc := aString findString: ‘marker’) > 0 ifTrue: [ loc := loc + ‘marker’ size ].
>>>>
>>>> Is there a better way? This whole pattern seems very old and clunky and not smalltalk like?
>>>>
>>>> Couldn’t we have: findAfter: aString ifAbsent: aBlock ?
>>>>
>>>> Or is there a whole better pattern for string searching that I’m missing ?
>>>>
>>>> Tim
>>>
>>
>>
>
>