The Inbox: Collections-ul.844.mcz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

The Inbox: Collections-ul.844.mcz

commits-2
Levente Uzonyi uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-ul.844.mcz

==================== Summary ====================

Name: Collections-ul.844
Author: ul
Time: 19 July 2019, 12:08:51.94435 am
UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
Ancestors: Collections-mt.843

- added String >> #atOrNil: which uses primitive 63 and returns either the character at the given index or nil when the primtiive fails. This is a faster alternative to #at:ifAbsent: when the absent block would yield nil.

=============== Diff against Collections-mt.843 ===============

Item was added:
+ ----- Method: String>>atOrNil: (in category 'as yet unclassified') -----
+ atOrNil: anIndex
+ "Return the character at anIndex or nil if the index or the argument is not valid."
+
+ <primitive: 63>
+ ^nil!


Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Chris Muller-3
Is it faster even if the argument to ifAbsent: is simply "nil" instead of "[nil]"?


On Thu, Jul 18, 2019 at 5:13 PM <[hidden email]> wrote:
Levente Uzonyi uploaded a new version of Collections to project The Inbox:
http://source.squeak.org/inbox/Collections-ul.844.mcz

==================== Summary ====================

Name: Collections-ul.844
Author: ul
Time: 19 July 2019, 12:08:51.94435 am
UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
Ancestors: Collections-mt.843

- added String >> #atOrNil: which uses primitive 63 and returns either the character at the given index or nil when the primtiive fails. This is a faster alternative to #at:ifAbsent: when the absent block would yield nil.

=============== Diff against Collections-mt.843 ===============

Item was added:
+ ----- Method: String>>atOrNil: (in category 'as yet unclassified') -----
+ atOrNil: anIndex
+       "Return the character at anIndex or nil if the index or the argument is not valid."
+
+       <primitive: 63>
+       ^nil!




Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Levente Uzonyi
On Thu, 18 Jul 2019, Chris Muller wrote:

> Is it faster even if the argument to ifAbsent: is simply "nil" instead of "[nil]"?

More than 4x faster when the index is valid and more than twice as fast
when the index is not valid.

Levente

>
> On Thu, Jul 18, 2019 at 5:13 PM <[hidden email]> wrote:
>       Levente Uzonyi uploaded a new version of Collections to project The Inbox:
>       http://source.squeak.org/inbox/Collections-ul.844.mcz
>
>       ==================== Summary ====================
>
>       Name: Collections-ul.844
>       Author: ul
>       Time: 19 July 2019, 12:08:51.94435 am
>       UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
>       Ancestors: Collections-mt.843
>
>       - added String >> #atOrNil: which uses primitive 63 and returns either the character at the given index or nil when the primtiive fails. This is a faster alternative to #at:ifAbsent: when the absent block would
>       yield nil.
>
>       =============== Diff against Collections-mt.843 ===============
>
>       Item was added:
>       + ----- Method: String>>atOrNil: (in category 'as yet unclassified') -----
>       + atOrNil: anIndex
>       +       "Return the character at anIndex or nil if the index or the argument is not valid."
>       +
>       +       <primitive: 63>
>       +       ^nil!
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Chris Muller-4
Wow, +1 then.  Amazing how you're able find this kind of performance
improvement in as low-level a method as String>>#at:ifAbsent:.

 - Chris

On Thu, Jul 18, 2019 at 5:58 PM Levente Uzonyi <[hidden email]> wrote:

>
> On Thu, 18 Jul 2019, Chris Muller wrote:
>
> > Is it faster even if the argument to ifAbsent: is simply "nil" instead of "[nil]"?
>
> More than 4x faster when the index is valid and more than twice as fast
> when the index is not valid.
>
> Levente
>
> >
> > On Thu, Jul 18, 2019 at 5:13 PM <[hidden email]> wrote:
> >       Levente Uzonyi uploaded a new version of Collections to project The Inbox:
> >       http://source.squeak.org/inbox/Collections-ul.844.mcz
> >
> >       ==================== Summary ====================
> >
> >       Name: Collections-ul.844
> >       Author: ul
> >       Time: 19 July 2019, 12:08:51.94435 am
> >       UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
> >       Ancestors: Collections-mt.843
> >
> >       - added String >> #atOrNil: which uses primitive 63 and returns either the character at the given index or nil when the primtiive fails. This is a faster alternative to #at:ifAbsent: when the absent block would
> >       yield nil.
> >
> >       =============== Diff against Collections-mt.843 ===============
> >
> >       Item was added:
> >       + ----- Method: String>>atOrNil: (in category 'as yet unclassified') -----
> >       + atOrNil: anIndex
> >       +       "Return the character at anIndex or nil if the index or the argument is not valid."
> >       +
> >       +       <primitive: 63>
> >       +       ^nil!
> >
> >
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

marcel.taeumel
Hi Levente,

could you expand the method comment to refer to #at:ifAbsent: and the [nil] example?  Write it in a way that programmers do not end up using #atOrNil: + ifNil: without knowing better. :-)

Best,
Marcel

Am 19.07.2019 01:04:33 schrieb Chris Muller <[hidden email]>:

Wow, +1 then. Amazing how you're able find this kind of performance
improvement in as low-level a method as String>>#at:ifAbsent:.

- Chris

On Thu, Jul 18, 2019 at 5:58 PM Levente Uzonyi wrote:
>
> On Thu, 18 Jul 2019, Chris Muller wrote:
>
> > Is it faster even if the argument to ifAbsent: is simply "nil" instead of "[nil]"?
>
> More than 4x faster when the index is valid and more than twice as fast
> when the index is not valid.
>
> Levente
>
> >
> > On Thu, Jul 18, 2019 at 5:13 PM wrote:
> > Levente Uzonyi uploaded a new version of Collections to project The Inbox:
> > http://source.squeak.org/inbox/Collections-ul.844.mcz
> >
> > ==================== Summary ====================
> >
> > Name: Collections-ul.844
> > Author: ul
> > Time: 19 July 2019, 12:08:51.94435 am
> > UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
> > Ancestors: Collections-mt.843
> >
> > - added String >> #atOrNil: which uses primitive 63 and returns either the character at the given index or nil when the primtiive fails. This is a faster alternative to #at:ifAbsent: when the absent block would
> > yield nil.
> >
> > =============== Diff against Collections-mt.843 ===============
> >
> > Item was added:
> > + ----- Method: String>>atOrNil: (in category 'as yet unclassified') -----
> > + atOrNil: anIndex
> > + "Return the character at anIndex or nil if the index or the argument is not valid."
> > +
> > +
> > + ^nil!
> >
> >
> >
> >



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Tobias Pape
In reply to this post by commits-2

Hi Levente

On Thu, 2019-07-18 at 22:13 +0000, [hidden email] wrote:

> Levente Uzonyi uploaded a new version of Collections to project The
> Inbox:
> http://source.squeak.org/inbox/Collections-ul.844.mcz
>
> ==================== Summary ====================
>
> Name: Collections-ul.844
> Author: ul
> Time: 19 July 2019, 12:08:51.94435 am
> UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
> Ancestors: Collections-mt.843
>
> - added String >> #atOrNil: which uses primitive 63 and returns
> either the character at the given index or nil when the primtiive
> fails. This is a faster alternative to #at:ifAbsent: when the absent
> block would yield nil.
>

Could you tell me the use case here?
This sounds like a convoluted range check.
I mean, yes it would be faster to do

 knorz := 'foo'
 index := 32
 ^ knorz atOrNil: index

but I would rather see

 knorz := 'foo'
 index := 32
 ^ index <= knorz size ifTrue:  [knorz at: size]

or more robust even


 knorz := 'foo'
 index := 32
 ^ knorz atPin: 32

(even though that changes the semantic)

So, can you tell me the use case here?
Best regards
        -Tobias

PS: That said, I think using #at: for String in the first place will
bite us down the road, whether we want to change the String
representation at some point (ropes, immutables, values, ...) or just
because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
or $'è'



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Levente Uzonyi
Hi Tobias,

On Fri, 19 Jul 2019, Tobias wrote:

>
> Hi Levente
>
> On Thu, 2019-07-18 at 22:13 +0000, [hidden email] wrote:
>> Levente Uzonyi uploaded a new version of Collections to project The
>> Inbox:
>> http://source.squeak.org/inbox/Collections-ul.844.mcz
>>
>> ==================== Summary ====================
>>
>> Name: Collections-ul.844
>> Author: ul
>> Time: 19 July 2019, 12:08:51.94435 am
>> UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
>> Ancestors: Collections-mt.843
>>
>> - added String >> #atOrNil: which uses primitive 63 and returns
>> either the character at the given index or nil when the primtiive
>> fails. This is a faster alternative to #at:ifAbsent: when the absent
>> block would yield nil.
>>
>
> Could you tell me the use case here?
ShoutCore-ul.66 in the Inbox.

> This sounds like a convoluted range check.

Yes, it's a way to avoid the range check being done by the image.

> I mean, yes it would be faster to do
>
> knorz := 'foo'
> index := 32
> ^ knorz atOrNil: index
>
> but I would rather see
>
> knorz := 'foo'
> index := 32
> ^ index <= knorz size ifTrue:  [knorz at: size]
That's not enough. You have to check the other end of the range as well.
And that's what this method tries to avoid.

>
> or more robust even
>
>
> knorz := 'foo'
> index := 32
> ^ knorz atPin: 32

You want to be able to distinguish invalid input and correct input in most
cases. With Shout that's mandatory, so #atPin: wouldn't work.

Levente

>
> (even though that changes the semantic)
>
> So, can you tell me the use case here?
> Best regards
> -Tobias
>
> PS: That said, I think using #at: for String in the first place will
> bite us down the road, whether we want to change the String
> representation at some point (ropes, immutables, values, ...) or just
> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
> or $'è'

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Tobias Pape

> On 19.07.2019, at 15:54, Levente Uzonyi <[hidden email]> wrote:
>
> Hi Tobias,
>
> On Fri, 19 Jul 2019, Tobias wrote:
>
>>
>> Hi Levente
>>
>> On Thu, 2019-07-18 at 22:13 +0000, [hidden email] wrote:
>>> Levente Uzonyi uploaded a new version of Collections to project The
>>> Inbox:
>>> http://source.squeak.org/inbox/Collections-ul.844.mcz
>>>
>>> ==================== Summary ====================
>>>
>>> Name: Collections-ul.844
>>> Author: ul
>>> Time: 19 July 2019, 12:08:51.94435 am
>>> UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
>>> Ancestors: Collections-mt.843
>>>
>>> - added String >> #atOrNil: which uses primitive 63 and returns
>>> either the character at the given index or nil when the primtiive
>>> fails. This is a faster alternative to #at:ifAbsent: when the absent
>>> block would yield nil.
>>>
>>
>> Could you tell me the use case here?
>
> ShoutCore-ul.66 in the Inbox.

I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
If so, we should really tackle _that_ instead of making pouring out nils fast.

>
>> This sounds like a convoluted range check.
>
> Yes, it's a way to avoid the range check being done by the image.
>
>> I mean, yes it would be faster to do
>>
>> knorz := 'foo'
>> index := 32
>> ^ knorz atOrNil: index
>>
>> but I would rather see
>>
>> knorz := 'foo'
>> index := 32
>> ^ index <= knorz size ifTrue:  [knorz at: size]
>
> That's not enough. You have to check the other end of the range as well.
> And that's what this method tries to avoid.

But its telling a different story.
We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?


>
>>
>> or more robust even
>>
>>
>> knorz := 'foo'
>> index := 32
>> ^ knorz atPin: 32
>
> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.

Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.

The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.

It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.

Best regards
        -Tobias :)



[1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf

>
> Levente
>
>>
>> (even though that changes the semantic)
>>
>> So, can you tell me the use case here?
>> Best regards
>> -Tobias
>>
>> PS: That said, I think using #at: for String in the first place will
>> bite us down the road, whether we want to change the String
>> representation at some point (ropes, immutables, values, ...) or just
>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>> or $'è'



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Levente Uzonyi
On Fri, 19 Jul 2019, Tobias Pape wrote:

>
>> On 19.07.2019, at 15:54, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Tobias,
>>
>> On Fri, 19 Jul 2019, Tobias wrote:
>>
>>>
>>> Hi Levente
>>>
>>> On Thu, 2019-07-18 at 22:13 +0000, [hidden email] wrote:
>>>> Levente Uzonyi uploaded a new version of Collections to project The
>>>> Inbox:
>>>> http://source.squeak.org/inbox/Collections-ul.844.mcz
>>>>
>>>> ==================== Summary ====================
>>>>
>>>> Name: Collections-ul.844
>>>> Author: ul
>>>> Time: 19 July 2019, 12:08:51.94435 am
>>>> UUID: df5ebfa9-4ebf-4505-8031-afd892a1061c
>>>> Ancestors: Collections-mt.843
>>>>
>>>> - added String >> #atOrNil: which uses primitive 63 and returns
>>>> either the character at the given index or nil when the primtiive
>>>> fails. This is a faster alternative to #at:ifAbsent: when the absent
>>>> block would yield nil.
>>>>
>>>
>>> Could you tell me the use case here?
>>
>> ShoutCore-ul.66 in the Inbox.
>
> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends
there?

Why does it matter what the frequency of the out-of-range access case is?

> If so, we should really tackle _that_ instead of making pouring out nils fast.

What's wrong with nil here?

>
>>
>>> This sounds like a convoluted range check.
>>
>> Yes, it's a way to avoid the range check being done by the image.
>>
>>> I mean, yes it would be faster to do
>>>
>>> knorz := 'foo'
>>> index := 32
>>> ^ knorz atOrNil: index
>>>
>>> but I would rather see
>>>
>>> knorz := 'foo'
>>> index := 32
>>> ^ index <= knorz size ifTrue:  [knorz at: size]
>>
>> That's not enough. You have to check the other end of the range as well.
>> And that's what this method tries to avoid.
>
> But its telling a different story.
What story is that?

> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?

Performance is the benefit. You see it, but you don't think you need it.

>
>
>>
>>>
>>> or more robust even
>>>
>>>
>>> knorz := 'foo'
>>> index := 32
>>> ^ knorz atPin: 32
>>
>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>
> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
So, how would you use #atPin: with Shout?

>
> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.

I'm sure many disagree with you on the boring part.
(IIRC the parser in Pharo replaced all three of those, but its
performance is still worse than Parser's. I remember seeing numbers around
3x slower, but that may have changed).

>
> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.

I take that (and the rest of your mail) as a -1 from you for
#atOrNil:.

Levente

>
> Best regards
> -Tobias :)
>
>
>
> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>
>> Levente
>>
>>>
>>> (even though that changes the semantic)
>>>
>>> So, can you tell me the use case here?
>>> Best regards
>>> -Tobias
>>>
>>> PS: That said, I think using #at: for String in the first place will
>>> bite us down the road, whether we want to change the String
>>> representation at some point (ropes, immutables, values, ...) or just
>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>> or $'è'

Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Tobias Pape
Hi Levente :)


Thanks for bearing with me here..


> On 20.07.2019, at 00:49, Levente Uzonyi <[hidden email]> wrote:
>
> On Fri, 19 Jul 2019, Tobias Pape wrote:
>
>>
>>> On 19.07.2019, at 15:54, Levente Uzonyi <[hidden email]> wrote:
>>> Hi Tobias,
>>>>> […]
>>>> Could you tell me the use case here?
>>> ShoutCore-ul.66 in the Inbox.
>>
>> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
>> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
>
> Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends there?

Yes, and I wanted to convey that I disagree.

>
> Why does it matter what the frequency of the out-of-range access case is?

Because optimizing it only makes sense if it is very frequent.

>
>> If so, we should really tackle _that_ instead of making pouring out nils fast.
>
> What's wrong with nil here?

What does nil mean here? It is used to convey "position out of range" but this domain knowledge is lost with using nil.
I don't think we're this deep in system level programming to condone using nil.

I understand that it has been used that way here (and not only in Shout's parser), but I don't think it is a good idea to prolifererate.

>
>>
>>>> This sounds like a convoluted range check.
>>> Yes, it's a way to avoid the range check being done by the image.
>>>> I mean, yes it would be faster to do
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atOrNil: index
>>>> but I would rather see
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ index <= knorz size ifTrue:  [knorz at: size]
>>> That's not enough. You have to check the other end of the range as well.
>>> And that's what this method tries to avoid.
>>
>> But its telling a different story.
>
> What story is that?
>

The method conveys "get something from me or a system-level thingything, whysoever".
Yes, "we" know it is an optimized range check but it is not discoverable by anyone not very used to the code base.

>> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?
>
> Performance is the benefit. You see it, but you don't think you need it.

We trade readability and understandability as well as simplicity for performance here. But we don't even know if we
need to be fast here. Who cares if shout takes 0.15 or 0.148 seconds to style a text?
I know Tim complained, and probably rightfully so.
But I am still not convinced that this trade-off will pay out…

>
>>
>>
>>>> or more robust even
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atPin: 32
>>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>>
>> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
>
> So, how would you use #atPin: with Shout?

With shout? probably not at all. This was a more general comment.
We're introducing a public API here, and it should have benefit beyond shout.

>
>>
>> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.
>
> I'm sure many disagree with you on the boring part.

Yeah, I meant that in a metaphorical way. Everytime I deal with a parser, it's exciting in the beginning and daunting in the long run…

> (IIRC the parser in Pharo replaced all three of those, but its performance is still worse than Parser's. I remember seeing numbers around 3x slower, but that may have changed).

I'm aware of the parser there. I have no performance infos at hand.
What I was hinting at was that if we optimize one Parser, and not the others, what's the overall benefit? We just end up with differently mysterious pieces of code.
But that's a different thread :)

>
>>
>> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.
>
> I take that (and the rest of your mail) as a -1 from you for #atOrNil:.
>

more like a -0.5. I am amazed this works and has the performance implications you told, so I wrote all this to see whether I can convince myself of a +1 but didn't

Best regards
        -Tobias

PS:

What about

at: aNumber orOffLimitSentinel: anObject
        <primitive: 63>
        ^ anObject

Or would the primitive bail?


> Levente
>
>>
>> Best regards
>> -Tobias :)
>>
>>
>>
>> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>> Levente
>>>> (even though that changes the semantic)
>>>> So, can you tell me the use case here?
>>>> Best regards
>>>> -Tobias
>>>> PS: That said, I think using #at: for String in the first place will
>>>> bite us down the road, whether we want to change the String
>>>> representation at some point (ropes, immutables, values, ...) or just
>>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>>> or $'è'



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

Levente Uzonyi
On Sat, 20 Jul 2019, Tobias Pape wrote:

> Hi Levente :)
>
>
> Thanks for bearing with me here..

:)

>
>
>> On 20.07.2019, at 00:49, Levente Uzonyi <[hidden email]> wrote:
>>
>> On Fri, 19 Jul 2019, Tobias Pape wrote:
>>
>>>
>>>> On 19.07.2019, at 15:54, Levente Uzonyi <[hidden email]> wrote:
>>>> Hi Tobias,
>>>>>> […]
>>>>> Could you tell me the use case here?
>>>> ShoutCore-ul.66 in the Inbox.
>>>
>>> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
>>> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
>>
>> Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends there?
>
> Yes, and I wanted to convey that I disagree.
>
>>
>> Why does it matter what the frequency of the out-of-range access case is?
>
> Because optimizing it only makes sense if it is very frequent.
This optimizes both cases, but the real benefit comes from optimizing the
most common case: when there's a character at the given index.
So, even if it didn't optimize the out-of-range case, it would still be a
lot faster.

>
>>
>>> If so, we should really tackle _that_ instead of making pouring out nils fast.
>>
>> What's wrong with nil here?
>
> What does nil mean here? It is used to convey "position out of range" but this domain knowledge is lost with using nil.

I think nil is a pretty good extremal element here:
1. it's not a character
2. it's a unique object with a global accessor
3. comparison is extremely quick
4. the compiler supports it with highly optimized branch statements

And the knowledge is not really lost, as all senders handle the extremal
case. So, all uses are localized.

I could use a custom EndOfStream object, but compared to nil, it would
lack property 4, so the code would be more verbose, and the performance
would be marginally worse.

> I don't think we're this deep in system level programming to condone using nil.

It might resemble system programming, but the main difference is that
there's no nil there. :)

>
> I understand that it has been used that way here (and not only in Shout's parser), but I don't think it is a good idea to prolifererate.
>
>>
>>>
>>>>> This sounds like a convoluted range check.
>>>> Yes, it's a way to avoid the range check being done by the image.
>>>>> I mean, yes it would be faster to do
>>>>> knorz := 'foo'
>>>>> index := 32
>>>>> ^ knorz atOrNil: index
>>>>> but I would rather see
>>>>> knorz := 'foo'
>>>>> index := 32
>>>>> ^ index <= knorz size ifTrue:  [knorz at: size]
>>>> That's not enough. You have to check the other end of the range as well.
>>>> And that's what this method tries to avoid.
>>>
>>> But its telling a different story.
>>
>> What story is that?
>>
>
> The method conveys "get something from me or a system-level thingything, whysoever".
> Yes, "we" know it is an optimized range check but it is not discoverable by anyone not very used to the code base.
Would you be fine with it if the method were called #atOrNilWhenTheIndexIsOutOfBounds:?

>
>>> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?
>>
>> Performance is the benefit. You see it, but you don't think you need it.
>
> We trade readability and understandability as well as simplicity for performance here. But we don't even know if we

I don't think legibility got worse, but if it did, please point out where
it happened.

> need to be fast here. Who cares if shout takes 0.15 or 0.148 seconds to style a text?
> I know Tim complained, and probably rightfully so.
> But I am still not convinced that this trade-off will pay out…

You can't know it without measuring it.
If you don't feel like rolling your own benchmark, good news: I just wrote
one. Print it: SHParserST80 benchmark.

>
>>
>>>
>>>
>>>>> or more robust even
>>>>> knorz := 'foo'
>>>>> index := 32
>>>>> ^ knorz atPin: 32
>>>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>>>
>>> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
>>
>> So, how would you use #atPin: with Shout?
>
> With shout? probably not at all. This was a more general comment.
> We're introducing a public API here, and it should have benefit beyond shout.
Well, I simply copied this method from another project of mine, which had
it implemented on SequenceableCollection and used it for all kinds of
arrays.
For example, it could be used to speed up String's #at:ifAbsent: :).

>
>>
>>>
>>> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.
>>
>> I'm sure many disagree with you on the boring part.
>
> Yeah, I meant that in a metaphorical way. Everytime I deal with a parser, it's exciting in the beginning and daunting in the long run…
>
>> (IIRC the parser in Pharo replaced all three of those, but its performance is still worse than Parser's. I remember seeing numbers around 3x slower, but that may have changed).
>
> I'm aware of the parser there. I have no performance infos at hand.
> What I was hinting at was that if we optimize one Parser, and not the others, what's the overall benefit? We just end up with differently mysterious pieces of code.
The three parsers are specialized on different areas. Shout's goal was to
be fast and to produce enough output for styling. So, improving its
performance is a natural thing, at least to me.

> But that's a different thread :)
>
>>
>>>
>>> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.
>>
>> I take that (and the rest of your mail) as a -1 from you for #atOrNil:.
>>
>
> more like a -0.5. I am amazed this works and has the performance implications you told, so I wrote all this to see whether I can convince myself of a +1 but didn't
>
> Best regards
> -Tobias
>
> PS:
>
> What about
>
> at: aNumber orOffLimitSentinel: anObject
> <primitive: 63>
> ^ anObject
>
> Or would the primitive bail?
The primitive fails when the number of arguments is not exactly one.
I tried that trick long ago with #at:ifAbsent:, even suggested changing
the VM code, but now, with the JIT, that would be a bit harder.

Levente

P.S.: I'm considering a different approach which uses a similar trick, but
no one would bother because it wouldn't introduce any new methods (and
that trick is used in many other places). But, I didn't want to rewrite
Shout...

>
>
>> Levente
>>
>>>
>>> Best regards
>>> -Tobias :)
>>>
>>>
>>>
>>> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>>> Levente
>>>>> (even though that changes the semantic)
>>>>> So, can you tell me the use case here?
>>>>> Best regards
>>>>> -Tobias
>>>>> PS: That said, I think using #at: for String in the first place will
>>>>> bite us down the road, whether we want to change the String
>>>>> representation at some point (ropes, immutables, values, ...) or just
>>>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>>>> or $'è'

Reply | Threaded
Open this post in threaded view
|

Shout performance (was Re: The Inbox: Collections-ul.844.mcz)

timrowledge
In reply to this post by Tobias Pape


> On 2019-07-19, at 4:12 PM, Tobias Pape <[hidden email]> wrote:
>  Who cares if shout takes 0.15 or 0.148 seconds to style a text?
> I know Tim complained, and probably rightfully so.

Well, in this particular case, it was a problem of Shout taking >1sec or, more precisely, having Shout enabled causing the typing experience on a slow machine  - a Pi 1 - to be very, very, poor. As in so slow that it was difficult to actually type without getting lost because of the delays, bursts of characters and so on. Only by completely unloading it could I get any acceptable editing experience. Now clearly, a Pi3 with a Cog VM is much, much, faster, so there is no particular practical issue any more.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- Monorail doesn't go all the way to Tomorrowland.



Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

marcel.taeumel
In reply to this post by Tobias Pape
Hi, there.

Who cares if shout takes 0.15 or 0.148 seconds to style a text?

Given that an entire Morphic cycle should take less than 16 milliseconds for the sake of responsiveness, such a difference would matter very much. Luckily, in my image, Shout is way faster to style the average piece of source code. :-) For about 400 characters it takes 192 microseconds. Thinking of slower systems, "nano" would be better to not be so close to "milli". ;-)

Best,
Marcel

Am 20.07.2019 01:12:25 schrieb Tobias Pape <[hidden email]>:

Hi Levente :)


Thanks for bearing with me here..


> On 20.07.2019, at 00:49, Levente Uzonyi wrote:
>
> On Fri, 19 Jul 2019, Tobias Pape wrote:
>
>>
>>> On 19.07.2019, at 15:54, Levente Uzonyi wrote:
>>> Hi Tobias,
>>>>> […]
>>>> Could you tell me the use case here?
>>> ShoutCore-ul.66 in the Inbox.
>>
>> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
>> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
>
> Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends there?

Yes, and I wanted to convey that I disagree.

>
> Why does it matter what the frequency of the out-of-range access case is?

Because optimizing it only makes sense if it is very frequent.

>
>> If so, we should really tackle _that_ instead of making pouring out nils fast.
>
> What's wrong with nil here?

What does nil mean here? It is used to convey "position out of range" but this domain knowledge is lost with using nil.
I don't think we're this deep in system level programming to condone using nil.

I understand that it has been used that way here (and not only in Shout's parser), but I don't think it is a good idea to prolifererate.

>
>>
>>>> This sounds like a convoluted range check.
>>> Yes, it's a way to avoid the range check being done by the image.
>>>> I mean, yes it would be faster to do
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atOrNil: index
>>>> but I would rather see
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ index <= knorz="" size="" iftrue:="" [knorz="" at:="">
>>> That's not enough. You have to check the other end of the range as well.
>>> And that's what this method tries to avoid.
>>
>> But its telling a different story.
>
> What story is that?
>

The method conveys "get something from me or a system-level thingything, whysoever".
Yes, "we" know it is an optimized range check but it is not discoverable by anyone not very used to the code base.

>> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?
>
> Performance is the benefit. You see it, but you don't think you need it.

We trade readability and understandability as well as simplicity for performance here. But we don't even know if we
need to be fast here. Who cares if shout takes 0.15 or 0.148 seconds to style a text?
I know Tim complained, and probably rightfully so.
But I am still not convinced that this trade-off will pay out…

>
>>
>>
>>>> or more robust even
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atPin: 32
>>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>>
>> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
>
> So, how would you use #atPin: with Shout?

With shout? probably not at all. This was a more general comment.
We're introducing a public API here, and it should have benefit beyond shout.

>
>>
>> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.
>
> I'm sure many disagree with you on the boring part.

Yeah, I meant that in a metaphorical way. Everytime I deal with a parser, it's exciting in the beginning and daunting in the long run…

> (IIRC the parser in Pharo replaced all three of those, but its performance is still worse than Parser's. I remember seeing numbers around 3x slower, but that may have changed).

I'm aware of the parser there. I have no performance infos at hand.
What I was hinting at was that if we optimize one Parser, and not the others, what's the overall benefit? We just end up with differently mysterious pieces of code.
But that's a different thread :)

>
>>
>> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.
>
> I take that (and the rest of your mail) as a -1 from you for #atOrNil:.
>

more like a -0.5. I am amazed this works and has the performance implications you told, so I wrote all this to see whether I can convince myself of a +1 but didn't

Best regards
-Tobias

PS:

What about

at: aNumber orOffLimitSentinel: anObject

^ anObject

Or would the primitive bail?


> Levente
>
>>
>> Best regards
>> -Tobias :)
>>
>>
>>
>> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>> Levente
>>>> (even though that changes the semantic)
>>>> So, can you tell me the use case here?
>>>> Best regards
>>>> -Tobias
>>>> PS: That said, I think using #at: for String in the first place will
>>>> bite us down the road, whether we want to change the String
>>>> representation at some point (ropes, immutables, values, ...) or just
>>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>>> or $'è'





Reply | Threaded
Open this post in threaded view
|

Re: The Inbox: Collections-ul.844.mcz

marcel.taeumel
Hi, all! :-)

Considering this specific addition to String, I am against it because - without care - it promotes ifNil-checks.

-1 for #atOrNil:

I would like to see a fast Shout without such "hacks". :-)

Best,
Marcel

Am 20.07.2019 10:18:14 schrieb Marcel Taeumel <[hidden email]>:

Hi, there.

Who cares if shout takes 0.15 or 0.148 seconds to style a text?

Given that an entire Morphic cycle should take less than 16 milliseconds for the sake of responsiveness, such a difference would matter very much. Luckily, in my image, Shout is way faster to style the average piece of source code. :-) For about 400 characters it takes 192 microseconds. Thinking of slower systems, "nano" would be better to not be so close to "milli". ;-)

Best,
Marcel

Am 20.07.2019 01:12:25 schrieb Tobias Pape <[hidden email]>:

Hi Levente :)


Thanks for bearing with me here..


> On 20.07.2019, at 00:49, Levente Uzonyi wrote:
>
> On Fri, 19 Jul 2019, Tobias Pape wrote:
>
>>
>>> On 19.07.2019, at 15:54, Levente Uzonyi wrote:
>>> Hi Tobias,
>>>>> […]
>>>> Could you tell me the use case here?
>>> ShoutCore-ul.66 in the Inbox.
>>
>> I see. I'd rather have Shout to make sure the index is valid _before_ accessing the string.
>> at:ifAbsent: is more concise and still idiomatic. It the out-of-range access a frequent case?
>
> Did you notice that I replaced #at:ifAbsent: sends with #atOrNil: sends there?

Yes, and I wanted to convey that I disagree.

>
> Why does it matter what the frequency of the out-of-range access case is?

Because optimizing it only makes sense if it is very frequent.

>
>> If so, we should really tackle _that_ instead of making pouring out nils fast.
>
> What's wrong with nil here?

What does nil mean here? It is used to convey "position out of range" but this domain knowledge is lost with using nil.
I don't think we're this deep in system level programming to condone using nil.

I understand that it has been used that way here (and not only in Shout's parser), but I don't think it is a good idea to prolifererate.

>
>>
>>>> This sounds like a convoluted range check.
>>> Yes, it's a way to avoid the range check being done by the image.
>>>> I mean, yes it would be faster to do
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atOrNil: index
>>>> but I would rather see
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ index <= knorz="" size="" iftrue:="" [knorz="" at:="">
>>> That's not enough. You have to check the other end of the range as well.
>>> And that's what this method tries to avoid.
>>
>> But its telling a different story.
>
> What story is that?
>

The method conveys "get something from me or a system-level thingything, whysoever".
Yes, "we" know it is an optimized range check but it is not discoverable by anyone not very used to the code base.

>> We're going pretty low-level there, and I frankly don't see the benefit. Why should performance be paramount?
>
> Performance is the benefit. You see it, but you don't think you need it.

We trade readability and understandability as well as simplicity for performance here. But we don't even know if we
need to be fast here. Who cares if shout takes 0.15 or 0.148 seconds to style a text?
I know Tim complained, and probably rightfully so.
But I am still not convinced that this trade-off will pay out…

>
>>
>>
>>>> or more robust even
>>>> knorz := 'foo'
>>>> index := 32
>>>> ^ knorz atPin: 32
>>> You want to be able to distinguish invalid input and correct input in most cases. With Shout that's mandatory, so #atPin: wouldn't work.
>>
>> Well, I don't know. Rinard says otherwise[1]. In most cases, you want some valid data when accessing some collection; atPin doest that quite well.
>
> So, how would you use #atPin: with Shout?

With shout? probably not at all. This was a more general comment.
We're introducing a public API here, and it should have benefit beyond shout.

>
>>
>> The case at hand seems just to indicate that writing parsers is boring and hard, and we probably shouldn't have 2 parsers (3, if your have refactorings installed) in the image, that all will disagree slightly and have different characteristics each.
>
> I'm sure many disagree with you on the boring part.

Yeah, I meant that in a metaphorical way. Everytime I deal with a parser, it's exciting in the beginning and daunting in the long run…

> (IIRC the parser in Pharo replaced all three of those, but its performance is still worse than Parser's. I remember seeing numbers around 3x slower, but that may have changed).

I'm aware of the parser there. I have no performance infos at hand.
What I was hinting at was that if we optimize one Parser, and not the others, what's the overall benefit? We just end up with differently mysterious pieces of code.
But that's a different thread :)

>
>>
>> It's cool to make shout faster, however, we shouldn't introduce easy-to-misuse low-level API for that.
>
> I take that (and the rest of your mail) as a -1 from you for #atOrNil:.
>

more like a -0.5. I am amazed this works and has the performance implications you told, so I wrote all this to see whether I can convince myself of a +1 but didn't

Best regards
-Tobias

PS:

What about

at: aNumber orOffLimitSentinel: anObject

^ anObject

Or would the primitive bail?


> Levente
>
>>
>> Best regards
>> -Tobias :)
>>
>>
>>
>> [1]: https://people.csail.mit.edu/rinard/paper/osdi04.pdf
>>> Levente
>>>> (even though that changes the semantic)
>>>> So, can you tell me the use case here?
>>>> Best regards
>>>> -Tobias
>>>> PS: That said, I think using #at: for String in the first place will
>>>> bite us down the road, whether we want to change the String
>>>> representation at some point (ropes, immutables, values, ...) or just
>>>> because Unicode is hard: ('Lèon' at: 3) can surely yield $'̀ ', $e
>>>> or $'è'