Re: [Pharo-project] #at:ifAbsent: for Array(s) could be faster

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] #at:ifAbsent: for Array(s) could be faster

Eliot Miranda-2
 


On Wed, Nov 2, 2011 at 11:33 AM, Levente Uzonyi <[hidden email]> wrote:
On Wed, 2 Nov 2011, Igor Stasenko wrote:

Here the original implementation, which Array inherits from
SequenceableCollection:

at: index ifAbsent: exceptionBlock
       "Answer the element at my position index. If I do not contain an element
       at index, answer the result of evaluating the argument, exceptionBlock."

       (index between: 1 and: self size) ifTrue: [^ self at: index].
       ^ exceptionBlock value


the thing is, that #at: primivite also doing a range checking and
fails if index is invalid, so if index is valid, as result we
performing a range checking twice instead of just once.

Here the simple implementation how to make things faster:
fat: index
       "Primitive. Assumes receiver is indexable. Answer the value of an
       indexable element in the receiver. Fail if the argument index is not an
       Integer or is out of bounds. Essential. See Object documentation
       whatIsAPrimitive. Read the class comment for a discussion about that the fact
       that the index can be a float."

       <primitive: 60>
       ^ #plopp

fat: index ifAbsent: aBlock
       | x |
       x := (self fat: index).
       x == #plopp ifTrue: [ ^ aBlock value ].
       ^ x

And speedup is almost 2 times:

[ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 1 ifAbsent: nil ] ] timeToRun

39

[ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 1 ifAbsent: nil ] ] timeToRun

20

Even when index is out of range, it is still much faster:

[ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 100 ifAbsent: nil ] ] timeToRun

35

[ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) fat: 100 ifAbsent: nil ]
] timeToRun

24

The problem, of course, that trick with #plopp is a dirty hack.
Because if array include an #plopp object as element, it will report
as it absent,
instead of answering it.
There is no way to ensure that any arbitrary object are never included
as element in array, so this is no-go for generic replacement.
But in controlled environment, when you know that none of arrays which
you using with #fat:ifAbsent: will ever include #plopp (or pick your
own unique object)
it may be a good alternative.

I've been thinking about this earlier and there's a much nicer and simpler solution, though it requires vm changes :). The idea is to change primitive 60 to work for 2 parameters too, so #at:ifAbsent: could also use the primitive directly. There were a few changes about #at: recently (mirror prims, cog), so I'm not sure it's still easy to change the primitive.

his is neat, but it wouldn't be an extension to primitive 60 but a new primitive that would ignore its top of stack.  primitive 60 can operate with 2 args,when used as a mirror primitive,  but then it ignores the receiver.  Your usage would ignore the last argument.

So choose a primitive number and I'll make it so...



Levente



--
Best regards,
Igor Stasenko.






--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] #at:ifAbsent: for Array(s) could be faster

Igor Stasenko

On 2 November 2011 20:55, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Wed, Nov 2, 2011 at 11:33 AM, Levente Uzonyi <[hidden email]> wrote:
>>
>> On Wed, 2 Nov 2011, Igor Stasenko wrote:
>>
>>> Here the original implementation, which Array inherits from
>>> SequenceableCollection:
>>>
>>> at: index ifAbsent: exceptionBlock
>>>        "Answer the element at my position index. If I do not contain an element
>>>        at index, answer the result of evaluating the argument, exceptionBlock."
>>>
>>>        (index between: 1 and: self size) ifTrue: [^ self at: index].
>>>        ^ exceptionBlock value
>>>
>>>
>>> the thing is, that #at: primivite also doing a range checking and
>>> fails if index is invalid, so if index is valid, as result we
>>> performing a range checking twice instead of just once.
>>>
>>> Here the simple implementation how to make things faster:
>>> fat: index
>>>        "Primitive. Assumes receiver is indexable. Answer the value of an
>>>        indexable element in the receiver. Fail if the argument index is not an
>>>        Integer or is out of bounds. Essential. See Object documentation
>>>        whatIsAPrimitive. Read the class comment for a discussion about that the fact
>>>        that the index can be a float."
>>>
>>>        <primitive: 60>
>>>        ^ #plopp
>>>
>>> fat: index ifAbsent: aBlock
>>>        | x |
>>>        x := (self fat: index).
>>>        x == #plopp ifTrue: [ ^ aBlock value ].
>>>        ^ x
>>>
>>> And speedup is almost 2 times:
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 1 ifAbsent: nil ] ] timeToRun
>>>
>>> 39
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 1 ifAbsent: nil ] ] timeToRun
>>>
>>> 20
>>>
>>> Even when index is out of range, it is still much faster:
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 100 ifAbsent: nil ] ] timeToRun
>>>
>>> 35
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) fat: 100 ifAbsent: nil ]
>>> ] timeToRun
>>>
>>> 24
>>>
>>> The problem, of course, that trick with #plopp is a dirty hack.
>>> Because if array include an #plopp object as element, it will report
>>> as it absent,
>>> instead of answering it.
>>> There is no way to ensure that any arbitrary object are never included
>>> as element in array, so this is no-go for generic replacement.
>>> But in controlled environment, when you know that none of arrays which
>>> you using with #fat:ifAbsent: will ever include #plopp (or pick your
>>> own unique object)
>>> it may be a good alternative.
>>
>> I've been thinking about this earlier and there's a much nicer and simpler solution, though it requires vm changes :). The idea is to change primitive 60 to work for 2 parameters too, so #at:ifAbsent: could also use the primitive directly. There were a few changes about #at: recently (mirror prims, cog), so I'm not sure it's still easy to change the primitive.
>
> his is neat, but it wouldn't be an extension to primitive 60 but a new primitive that would ignore its top of stack.  primitive 60 can operate with 2 args,when used as a mirror primitive,  but then it ignores the receiver.  Your usage would ignore the last argument.
> So choose a primitive number and I'll make it so...

Of course i was thinking about changing VM too. But i am uncertain if
it worth doing so, because maybe this is too low reward for putting
effort in it.

But if we going to make new prim , here's what i'd like to have
 - use proper prim error code(s) to distinguish between cases:
   - a receiver is not indexable/variable object
   - an index is integer, but points outside of array's range
   - an object passed as index is non-integer

this is to eliminate code which tries to guess what is really happen:

<primitive: 60>
index isInteger ifTrue:
                [self class isVariable
                        ifTrue: [self errorSubscriptBounds: index]
                        ifFalse: [self errorNotIndexable]].
        index isNumber
                ifTrue: [^self at: index asInteger]
                ifFalse: [self errorNonIntegerIndex]


Btw, this is an example of how much more flexibility we could earn, if
we could rewrite basic prims to count (and use) arguments from bottom
of stack (receiver) instead from top,
so primitives could be used in more flexible manner.


--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] #at:ifAbsent: for Array(s) could be faster

Eliot Miranda-2
 


On Wed, Nov 2, 2011 at 1:15 PM, Igor Stasenko <[hidden email]> wrote:

On 2 November 2011 20:55, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Wed, Nov 2, 2011 at 11:33 AM, Levente Uzonyi <[hidden email]> wrote:
>>
>> On Wed, 2 Nov 2011, Igor Stasenko wrote:
>>
>>> Here the original implementation, which Array inherits from
>>> SequenceableCollection:
>>>
>>> at: index ifAbsent: exceptionBlock
>>>        "Answer the element at my position index. If I do not contain an element
>>>        at index, answer the result of evaluating the argument, exceptionBlock."
>>>
>>>        (index between: 1 and: self size) ifTrue: [^ self at: index].
>>>        ^ exceptionBlock value
>>>
>>>
>>> the thing is, that #at: primivite also doing a range checking and
>>> fails if index is invalid, so if index is valid, as result we
>>> performing a range checking twice instead of just once.
>>>
>>> Here the simple implementation how to make things faster:
>>> fat: index
>>>        "Primitive. Assumes receiver is indexable. Answer the value of an
>>>        indexable element in the receiver. Fail if the argument index is not an
>>>        Integer or is out of bounds. Essential. See Object documentation
>>>        whatIsAPrimitive. Read the class comment for a discussion about that the fact
>>>        that the index can be a float."
>>>
>>>        <primitive: 60>
>>>        ^ #plopp
>>>
>>> fat: index ifAbsent: aBlock
>>>        | x |
>>>        x := (self fat: index).
>>>        x == #plopp ifTrue: [ ^ aBlock value ].
>>>        ^ x
>>>
>>> And speedup is almost 2 times:
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 1 ifAbsent: nil ] ] timeToRun
>>>
>>> 39
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 1 ifAbsent: nil ] ] timeToRun
>>>
>>> 20
>>>
>>> Even when index is out of range, it is still much faster:
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) at: 100 ifAbsent: nil ] ] timeToRun
>>>
>>> 35
>>>
>>> [ 1000000 timesRepeat: [ #(1 2 4 5 56 67 67) fat: 100 ifAbsent: nil ]
>>> ] timeToRun
>>>
>>> 24
>>>
>>> The problem, of course, that trick with #plopp is a dirty hack.
>>> Because if array include an #plopp object as element, it will report
>>> as it absent,
>>> instead of answering it.
>>> There is no way to ensure that any arbitrary object are never included
>>> as element in array, so this is no-go for generic replacement.
>>> But in controlled environment, when you know that none of arrays which
>>> you using with #fat:ifAbsent: will ever include #plopp (or pick your
>>> own unique object)
>>> it may be a good alternative.
>>
>> I've been thinking about this earlier and there's a much nicer and simpler solution, though it requires vm changes :). The idea is to change primitive 60 to work for 2 parameters too, so #at:ifAbsent: could also use the primitive directly. There were a few changes about #at: recently (mirror prims, cog), so I'm not sure it's still easy to change the primitive.
>
> his is neat, but it wouldn't be an extension to primitive 60 but a new primitive that would ignore its top of stack.  primitive 60 can operate with 2 args,when used as a mirror primitive,  but then it ignores the receiver.  Your usage would ignore the last argument.
> So choose a primitive number and I'll make it so...

Of course i was thinking about changing VM too. But i am uncertain if
it worth doing so, because maybe this is too low reward for putting
effort in it.

But if we going to make new prim , here's what i'd like to have
 - use proper prim error code(s) to distinguish between cases:
  - a receiver is not indexable/variable object
  - an index is integer, but points outside of array's range
  - an object passed as index is non-integer

I agree.  I was going to brag that the Cog VM already has this implemented but the error code returned for a non-indexable receiver is wrong.  It should be #'bad receiver' but the prim always answers #'bad index'.  I'll fix this.  Primitive error codes and relevant usage in primitives still need to be ported to the standard VM though.


this is to eliminate code which tries to guess what is really happen:

<primitive: 60>
index isInteger ifTrue:
               [self class isVariable
                       ifTrue: [self errorSubscriptBounds: index]
                       ifFalse: [self errorNotIndexable]].
       index isNumber
               ifTrue: [^self at: index asInteger]
               ifFalse: [self errorNonIntegerIndex]


Btw, this is an example of how much more flexibility we could earn, if
we could rewrite basic prims to count (and use) arguments from bottom
of stack (receiver) instead from top,
so primitives could be used in more flexible manner.


--
Best regards,
Igor Stasenko.



--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] #at:ifAbsent: for Array(s) could be faster

David T. Lewis
 
On Wed, Nov 02, 2011 at 01:23:33PM -0700, Eliot Miranda wrote:

>  
> On Wed, Nov 2, 2011 at 1:15 PM, Igor Stasenko <[hidden email]> wrote:
> >
> > Of course i was thinking about changing VM too. But i am uncertain if
> > it worth doing so, because maybe this is too low reward for putting
> > effort in it.
> >
> > But if we going to make new prim , here's what i'd like to have
> >  - use proper prim error code(s) to distinguish between cases:
> >   - a receiver is not indexable/variable object
> >   - an index is integer, but points outside of array's range
> >   - an object passed as index is non-integer
> >
>
> I agree.  I was going to brag that the Cog VM already has this implemented
> but the error code returned for a non-indexable receiver is wrong.  It
> should be #'bad receiver' but the prim always answers #'bad index'.  I'll
> fix this.  Primitive error codes and relevant usage in primitives still
> need to be ported to the standard VM though.

Hopefully we're not too far apart on this. I added your basic mechanism
for error codes a while back (see below) so adding the relevant usage to
primitives should be strightforward, and in principle a new primitive that
uses error codes should work on both Cog and the standard VM.

Dave

  Name: VMMaker-dtl.237
  Author: dtl
  Time: 23 May 2011, 11:38:51.478 pm
  UUID: db716650-3b1b-4d54-9aba-6257a21a50a7
  Ancestors: VMMaker-dtl.236
 
  VMMaker 4.5.1
 
  Convert primitive error reporting to use #primitiveFailFor: such that the
  successFlag variable is replaced with primFailCode (integer value, 0 for
  success, 1, 2, 3... for failure codes), consistent with oscog.
 
  Includes improved #failed, #successful, and #success: methods recoded for
  performance in C, yielding performance equivalent to previous implementation
  with successFlag.
 
  Changes were applied as follows:
 
  - Replaced all occurances of "successFlag := true" with "self initPrimCall",
    which initialize primFailCode to 0.
  - Replaced all "successFlag := false" with "self primitiveFail".
  - Replaced all "successFlag ifTrue: [] ifFalse: []" with
    "self successful ifTrue: [] ifFalse: []".
  - Updated #primitiveFail, #failed and #success: to use primFailCode rather
    than successFlag.
  - Removed successFlag variable.

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] #at:ifAbsent: for Array(s) could be faster

stephane ducasse-2
In reply to this post by Eliot Miranda-2

Elliot

How could we help there.

I would really like to sit with igor and do some stupid work on the VM. The best way to learn is to do something besides a master. So if this would be to help in addition I would
really allocate time on my wonderfully exciting admin tasks granted by my status :)

> I agree.  I was going to brag that the Cog VM already has this implemented but the error code returned for a non-indexable receiver is wrong.  It should be #'bad receiver' but the prim always answers #'bad index'.  I'll fix this.  Primitive error codes and relevant usage in primitives still need to be ported to the standard VM though.

Stef