Noob Question - slicing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Noob Question - slicing

Adventurer

Hi All,

 

I was trying to figure an elegant way to slice strings (Python style), and came up empty.

 

What is the simplest way to copy characters between positions 4 and 8 from a string in Pharo?

 

Craig

Reply | Threaded
Open this post in threaded view
|

Re: Noob Question - slicing

Offray Vladimir Luna Cárdenas-2


On 5/3/19 17:59, Craig Johnson wrote:

Hi All,

 

I was trying to figure an elegant way to slice strings (Python style), and came up empty.

 

What is the simplest way to copy characters between positions 4 and 8 from a string in Pharo?

 

Craig


You can use:

'Strings are intuitive' copyFrom: 4 to: 8

which should give you: 'ings '.

There is an excellent tutorial about Strings on [1].

[1] http://stephane.ducasse.free.fr/FreeBooks/ByExample/14%20-%20Chapter%2012%20-%20Strings.pdf

If you share with us what are the results of your internet searchers to answer yourself while asking in this list and why they don't work for you, we can help you better (see [2], for a kind of updated classic on that).

[2] http://www.catb.org/esr/faqs/smart-questions.html

Cheers,

Offray

Reply | Threaded
Open this post in threaded view
|

Re: Noob Question - slicing

Richard O'Keefe
In reply to this post by Adventurer
As someone else already pointed out, the standard way to copy
part of any array-like sequence is
   aSequence copyFrom: firstIncludedIndex to: lastIncludedIndex

How could you have found this by yourself?

From the background, either
  Click-select Tools-select Playground
or hold Ctrl down while typing OW (for Open Workspace).
In the new Playground/workspace, type
  String
then Ctrl-B.  You now have a five-pane browser open on
String.

About half way down the third panel on the top you will see
  converting
  copying
  displaying
Click on copying.

Guess what?  You WON'T see anything that looks relevant!
Maybe it's in an ancestral class.  So look at the buttons
under the top row of panes:
 All packages  Scoped view | Flat Hier | Inst side Class side | ...
Hier looks promising.  Click it.

Now in the second pane you will see
ProtoObject
  Object
    Collection
      SequenceableCollection
        ArrayedCollection
          String
            ...
Select the parent class, ArrayedCollection.
Nope, nothing promising there either!
Select the grandparent class, SequenceableCollection.
And now the 'copying' method category has quite a few
potentially interesting methods, including #copyFrom:to:.

There are four other related methods that I would argue
are in the wrong category:
  allButFirst
  allButFirst: count
  allButLast
  allButLast: count
are in the 'accessing' category.

If you want something a bit more flexible,
you could add
    drop: d take: t
      "Discard the first |d| elements of the receiver if d >= 0.
       Discard the last  |d| elements of the receiver if d <= 0.
       Return  the first |t| elements of the result   if t >= 0.
       Return  the last  |t| elements of the result   if t <= 0.
       The result is the same kind of collection as the receiver."
      |lb ub n|
      n := self size.
      d abs >= n ifTrue: [^self copyEmpty].
      d < 0
        ifTrue:  [lb := 1.   ub := n + d]
        ifFalse: [lb := d+1. ub := n].
      ub - lb + 1 <= t abs ifFalse: [
        t < 0
          ifTrue:  [lb := ub + t + 1]
          ifFalse: [ub := lb + t - 1]].
      ^self copyFrom: lb to: ub

Now I would like to suggest that you not use anything like this
*directly*.  Go back to the Playground, and evaluate

  String with: (Character codePoint: 256) <Ctrl-P>

The answer is 'Ā'. Pharo supports Unicode.  Now try

  String with: $A with: (Character codePoint: 16r0304)

In Pharo, the result looks like A followed by a separate
macron, but when it's pasted into HTML it displays
correctly as 'Ā'.  Pharo doesn't *quite* support Unicode.
If it did, the two strings with different lengths and no
codepoint in common would display exactly the same.

The Unicode standard finds it necessary to distinguish
between characters, glyphs, graphemes, grapheme clusters,
codepoints, and a couple of other things.  A Smalltalk
String is a sequence of *codepoints*, not a sequence of
"characters".  There is no upper bound on the number of
codepoints that may be needed to encode one "character"
as the end user sees it, and from there on it gets
*complicated*.

For over 20 years, it hasn't really made sense to think
of a string as a simply indexed sequence of characters.
Integer indices are a useful implementation-level detail
for remembering bounds from some "higher level" matching
technique, but are much less useful than you might expect.







On Wed, 6 Mar 2019 at 12:00, Craig Johnson <[hidden email]> wrote:

Hi All,

 

I was trying to figure an elegant way to slice strings (Python style), and came up empty.

 

What is the simplest way to copy characters between positions 4 and 8 from a string in Pharo?

 

Craig

Reply | Threaded
Open this post in threaded view
|

Re: Noob Question - slicing

Tim Mackinnon
Nice reply Richard - do you ever post any of these in a blog - the one below would be a great one to point to from Exercism...

On 6 Mar 2019, at 12:48, Richard O'Keefe <[hidden email]> wrote:

As someone else already pointed out, the standard way to copy
part of any array-like sequence is
   aSequence copyFrom: firstIncludedIndex to: lastIncludedIndex

How could you have found this by yourself?

From the background, either
  Click-select Tools-select Playground
or hold Ctrl down while typing OW (for Open Workspace).
In the new Playground/workspace, type
  String
then Ctrl-B.  You now have a five-pane browser open on
String.

About half way down the third panel on the top you will see
  converting
  copying
  displaying
Click on copying.

Guess what?  You WON'T see anything that looks relevant!
Maybe it's in an ancestral class.  So look at the buttons
under the top row of panes:
 All packages  Scoped view | Flat Hier | Inst side Class side | ...
Hier looks promising.  Click it.

Now in the second pane you will see
ProtoObject
  Object
    Collection
      SequenceableCollection
        ArrayedCollection
          String
            ...
Select the parent class, ArrayedCollection.
Nope, nothing promising there either!
Select the grandparent class, SequenceableCollection.
And now the 'copying' method category has quite a few
potentially interesting methods, including #copyFrom:to:.

There are four other related methods that I would argue
are in the wrong category:
  allButFirst
  allButFirst: count
  allButLast
  allButLast: count
are in the 'accessing' category.

If you want something a bit more flexible,
you could add
    drop: d take: t
      "Discard the first |d| elements of the receiver if d >= 0.
       Discard the last  |d| elements of the receiver if d <= 0.
       Return  the first |t| elements of the result   if t >= 0.
       Return  the last  |t| elements of the result   if t <= 0.
       The result is the same kind of collection as the receiver."
      |lb ub n|
      n := self size.
      d abs >= n ifTrue: [^self copyEmpty].
      d < 0
        ifTrue:  [lb := 1.   ub := n + d]
        ifFalse: [lb := d+1. ub := n].
      ub - lb + 1 <= t abs ifFalse: [
        t < 0
          ifTrue:  [lb := ub + t + 1]
          ifFalse: [ub := lb + t - 1]].
      ^self copyFrom: lb to: ub

Now I would like to suggest that you not use anything like this
*directly*.  Go back to the Playground, and evaluate

  String with: (Character codePoint: 256) <Ctrl-P>

The answer is 'Ā'. Pharo supports Unicode.  Now try

  String with: $A with: (Character codePoint: 16r0304)

In Pharo, the result looks like A followed by a separate
macron, but when it's pasted into HTML it displays
correctly as 'Ā'.  Pharo doesn't *quite* support Unicode.
If it did, the two strings with different lengths and no
codepoint in common would display exactly the same.

The Unicode standard finds it necessary to distinguish
between characters, glyphs, graphemes, grapheme clusters,
codepoints, and a couple of other things.  A Smalltalk
String is a sequence of *codepoints*, not a sequence of
"characters".  There is no upper bound on the number of
codepoints that may be needed to encode one "character"
as the end user sees it, and from there on it gets
*complicated*.

For over 20 years, it hasn't really made sense to think
of a string as a simply indexed sequence of characters.
Integer indices are a useful implementation-level detail
for remembering bounds from some "higher level" matching
technique, but are much less useful than you might expect.







On Wed, 6 Mar 2019 at 12:00, Craig Johnson <[hidden email]> wrote:

Hi All,

 

I was trying to figure an elegant way to slice strings (Python style), and came up empty.

 

What is the simplest way to copy characters between positions 4 and 8 from a string in Pharo?

 

Craig


Reply | Threaded
Open this post in threaded view
|

Re: Noob Question - slicing

Sven Van Caekenberghe-2
In reply to this post by Richard O'Keefe
In Calypso, select the checkbox next to the virtual category 'instance side', which will show all inherited methods (below Object) in the same browser, scroll to copy.

#drop:take: sounds totally obscure to me, way worse than #copyFrom:to: first: or last:

And what is your point with the Unicode rant ?

What you are pointing at is called Normalization. It is not part of the standard image because the necessary databases and fonts are big. But is can be done in Pharo just fine.

https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43

What is it with people complaining that they did not find something where they expected it ? Every language/environment/library/framework has a learning curve.

The current class/method categorisations are what they are (historically and by design), there is more to life than string processing, many operations do make sense in other contexts too, that is a powerful thing.

> On 6 Mar 2019, at 13:48, Richard O'Keefe <[hidden email]> wrote:
>
> As someone else already pointed out, the standard way to copy
> part of any array-like sequence is
>    aSequence copyFrom: firstIncludedIndex to: lastIncludedIndex
>
> How could you have found this by yourself?
>
> From the background, either
>   Click-select Tools-select Playground
> or hold Ctrl down while typing OW (for Open Workspace).
> In the new Playground/workspace, type
>   String
> then Ctrl-B.  You now have a five-pane browser open on
> String.
>
> About half way down the third panel on the top you will see
>   converting
>   copying
>   displaying
> Click on copying.
>
> Guess what?  You WON'T see anything that looks relevant!
> Maybe it's in an ancestral class.  So look at the buttons
> under the top row of panes:
>  All packages  Scoped view | Flat Hier | Inst side Class side | ...
> Hier looks promising.  Click it.
>
> Now in the second pane you will see
> ProtoObject
>   Object
>     Collection
>       SequenceableCollection
>         ArrayedCollection
>           String
>             ...
> Select the parent class, ArrayedCollection.
> Nope, nothing promising there either!
> Select the grandparent class, SequenceableCollection.
> And now the 'copying' method category has quite a few
> potentially interesting methods, including #copyFrom:to:.
>
> There are four other related methods that I would argue
> are in the wrong category:
>   allButFirst
>   allButFirst: count
>   allButLast
>   allButLast: count
> are in the 'accessing' category.
>
> If you want something a bit more flexible,
> you could add
>     drop: d take: t
>       "Discard the first |d| elements of the receiver if d >= 0.
>        Discard the last  |d| elements of the receiver if d <= 0.
>        Return  the first |t| elements of the result   if t >= 0.
>        Return  the last  |t| elements of the result   if t <= 0.
>        The result is the same kind of collection as the receiver."
>       |lb ub n|
>       n := self size.
>       d abs >= n ifTrue: [^self copyEmpty].
>       d < 0
>         ifTrue:  [lb := 1.   ub := n + d]
>         ifFalse: [lb := d+1. ub := n].
>       ub - lb + 1 <= t abs ifFalse: [
>         t < 0
>           ifTrue:  [lb := ub + t + 1]
>           ifFalse: [ub := lb + t - 1]].
>       ^self copyFrom: lb to: ub
>
> Now I would like to suggest that you not use anything like this
> *directly*.  Go back to the Playground, and evaluate
>
>   String with: (Character codePoint: 256) <Ctrl-P>
>
> The answer is 'Ā'. Pharo supports Unicode.  Now try
>
>   String with: $A with: (Character codePoint: 16r0304)
>
> In Pharo, the result looks like A followed by a separate
> macron, but when it's pasted into HTML it displays
> correctly as 'Ā'.  Pharo doesn't *quite* support Unicode.
> If it did, the two strings with different lengths and no
> codepoint in common would display exactly the same.
>
> The Unicode standard finds it necessary to distinguish
> between characters, glyphs, graphemes, grapheme clusters,
> codepoints, and a couple of other things.  A Smalltalk
> String is a sequence of *codepoints*, not a sequence of
> "characters".  There is no upper bound on the number of
> codepoints that may be needed to encode one "character"
> as the end user sees it, and from there on it gets
> *complicated*.
>
> For over 20 years, it hasn't really made sense to think
> of a string as a simply indexed sequence of characters.
> Integer indices are a useful implementation-level detail
> for remembering bounds from some "higher level" matching
> technique, but are much less useful than you might expect.
>
>
>
>
>
>
>
> On Wed, 6 Mar 2019 at 12:00, Craig Johnson <[hidden email]> wrote:
> Hi All,
>
>  
>
> I was trying to figure an elegant way to slice strings (Python style), and came up empty.
>
>  
>
> What is the simplest way to copy characters between positions 4 and 8 from a string in Pharo?
>
>  
>
> Craig
>


Reply | Threaded
Open this post in threaded view
|

Re: Noob Question - slicing

K K Subbu
In reply to this post by Richard O'Keefe
> On Wed, 6 Mar 2019 at 12:00, Craig Johnson <[hidden email]
> <mailto:[hidden email]>> wrote:

>     What is the simplest way to copy characters between positions 4 and
>     8 from a string in Pharo?____

'1234567890' copyFrom: 5 to: 8 "5678"

Page 207 in Updated Pharo by Example book explains more on substrings.

HTH .. Subbu

Reply | Threaded
Open this post in threaded view
|

Re: Noob Question - slicing

Adventurer
From: Pharo-users [mailto:[hidden email]] On Behalf Of K K Subbu
Sent: Wednesday, 06 March 2019 16:45
To: [hidden email]
Subject: Re: [Pharo-users] Noob Question - slicing


> '1234567890' copyFrom: 5 to: 8 "5678"
>
> Page 207 in Updated Pharo by Example book explains more on substrings.
>
> HTH .. Subbu

Thank you for this, it is perfectly clear.

I disregarded the Pharo-by-Example linked from the Pharo.org page, because when I tried to use examples in that book to read text files, I found that the examples did not work under Pharo V7.  The book is for Pharo 5.

My mistake.

If there's a later version of that book available, I'd love to know.

Craig