Object basicNew vs object shallowCopy

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Object basicNew vs object shallowCopy

Denis Kudriashov
 
Hi.

I compared performance between object instantiation and object cloning. I was wondering that instantiation almost twice faster than clone (primitive 70 vs 148).

Could you explain why it like that and could it be improved?

I was think that new object construction is much complex because it requires to fill all object fields (header structure and etc).
And I was think that copy is just simple function like memcpy which just copy bytes without any logic.

Here is my code:

object := Object new.
3 timesRepeat: [ Smalltalk garbageCollect ].
result1 := [ Object basicNew ] benchFor: 10 seconds.
3 timesRepeat: [ Smalltalk garbageCollect ].
result2 := [ object shallowCopy ] benchFor: 10 seconds.
{result1. result2}.
 "an Array(a BenchmarkResult(518,021,045 iterations in 10 seconds 2 milliseconds. 51,791,746 per second) a BenchmarkResult(302,807,253 iterations in 10 seconds 4 milliseconds. 30,268,618 per second))" 

(I run it on latest Pharo on Mac SpurVM)

Best regards,
Denis
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Eliot Miranda-2

Hi Denis,

    the difference us because basicNew[:] are implemented with machine code primitives whereas shallowCopy is an interpreter primitive and machine code primitives are much faster to invoke.  I could add a machine code shallowCopy primitive that would handle common cases and exclude the complex ones (CompiledMethod and Context, because they contain hidden JIT state that must not be copied).  How important is shallowCopy performance to you?

_,,,^..^,,,_ (phone)

> On Jul 29, 2016, at 2:02 AM, Denis Kudriashov <[hidden email]> wrote:
>
> Hi.
>
> I compared performance between object instantiation and object cloning. I was wondering that instantiation almost twice faster than clone (primitive 70 vs 148).
>
> Could you explain why it like that and could it be improved?
>
> I was think that new object construction is much complex because it requires to fill all object fields (header structure and etc).
> And I was think that copy is just simple function like memcpy which just copy bytes without any logic.
>
> Here is my code:
>
> object := Object new.
> 3 timesRepeat: [ Smalltalk garbageCollect ].
> result1 := [ Object basicNew ] benchFor: 10 seconds.
> 3 timesRepeat: [ Smalltalk garbageCollect ].
> result2 := [ object shallowCopy ] benchFor: 10 seconds.
> {result1. result2}.
>  "an Array(a BenchmarkResult(518,021,045 iterations in 10 seconds 2 milliseconds. 51,791,746 per second) a BenchmarkResult(302,807,253 iterations in 10 seconds 4 milliseconds. 30,268,618 per second))"
>
> (I run it on latest Pharo on Mac SpurVM)
>
> Best regards,
> Denis
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Eliot Miranda-2
In reply to this post by Denis Kudriashov
 
BTW, to better compare the two, unroll the inner loop ten times, put Object in a temp to eliminate the indirection, and use explicit temps, i.e.

| o object |
o := Object new.
object := Object.
3 timesRepeat: [ Smalltalk garbageCollect ].
result1 := [ object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew. object basicNew ] benchFor: 10 seconds.
3 timesRepeat: [ Smalltalk garbageCollect ].
result2 := [ o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy. o shallowCopy ] benchFor: 10 seconds.
{result1. result2}.

_,,,^..^,,,_ (phone)

On Jul 29, 2016, at 2:02 AM, Denis Kudriashov <[hidden email]> wrote:

Hi.

I compared performance between object instantiation and object cloning. I was wondering that instantiation almost twice faster than clone (primitive 70 vs 148).

Could you explain why it like that and could it be improved?

I was think that new object construction is much complex because it requires to fill all object fields (header structure and etc).
And I was think that copy is just simple function like memcpy which just copy bytes without any logic.

Here is my code:

object := Object new.
3 timesRepeat: [ Smalltalk garbageCollect ].
result1 := [ Object basicNew ] benchFor: 10 seconds.
3 timesRepeat: [ Smalltalk garbageCollect ].
result2 := [ object shallowCopy ] benchFor: 10 seconds.
{result1. result2}.
 "an Array(a BenchmarkResult(518,021,045 iterations in 10 seconds 2 milliseconds. 51,791,746 per second) a BenchmarkResult(302,807,253 iterations in 10 seconds 4 milliseconds. 30,268,618 per second))" 

(I run it on latest Pharo on Mac SpurVM)

Best regards,
Denis
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Denis Kudriashov
In reply to this post by Eliot Miranda-2
 
Hi Eliot,

2016-07-30 7:48 GMT+02:00 Eliot Miranda <[hidden email]>:
    the difference us because basicNew[:] are implemented with machine code primitives whereas shallowCopy is an interpreter primitive and machine code primitives are much faster to invoke.  I could add a machine code shallowCopy primitive that would handle common cases and exclude the complex ones (CompiledMethod and Context, because they contain hidden JIT state that must not be copied).  How important is shallowCopy performance to you?

I would not say that it is too much critical but it could improve some prototype based frameworks where objects are created by cloning.

What you expect from optimized version performance? Will it be faster than #basicNew?

Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Levente Uzonyi
 
It can't be quicker than basicNew. It will be quicker than basicNew +
manual copying of the fields.
And yes, this is quite important to be quick along with the #copyFrom:
primitive (168).

Levente
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Eliot Miranda-2

Hi Denis, Hi Levente,

> On Jul 31, 2016, at 3:41 AM, Levente Uzonyi <[hidden email]> wrote:
>
> It can't be quicker than basicNew.

Depends. Decoding size and encoding header from the class receiver in basicNew might be slower than decoding size and encoding header from the instance receiver in shallowCopy.  But the difference should be small.

> It will be quicker than basicNew + manual copying of the fields.

Agreed.

> And yes, this is quite important to be quick along with the #copyFrom: primitive (168).

It's on the to do list then :-)

> Levente

_,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

johnmci
 
So has anyone thought about pre-allocating a few commonly used objects during idle time?  Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).

Also a decade back I looked at the new logic and rearranged the code to fast path the creation of commonly used objects. This improved things a bit but didn't really outweight the resulting mess of if statements

Sent from my iPhone

> On Jul 31, 2016, at 08:45, Eliot Miranda <[hidden email]> wrote:
>
>
> Hi Denis, Hi Levente,
>
>> On Jul 31, 2016, at 3:41 AM, Levente Uzonyi <[hidden email]> wrote:
>>
>> It can't be quicker than basicNew.
>
> Depends. Decoding size and encoding header from the class receiver in basicNew might be slower than decoding size and encoding header from the instance receiver in shallowCopy.  But the difference should be small.
>
>> It will be quicker than basicNew + manual copying of the fields.
>
> Agreed.
>
>> And yes, this is quite important to be quick along with the #copyFrom: primitive (168).
>
> It's on the to do list then :-)
>
>> Levente
>
> _,,,^..^,,,_ (phone)

smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Eliot Miranda-2

Hi John,

> On Jul 31, 2016, at 7:32 AM, John McIntosh <[hidden email]> wrote:
>
> So has anyone thought about pre-allocating a few commonly used objects during idle time?  Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).

Thus only works to the extent that a particular kind if object us allocated all the time and to the extent that synthesizing an object is relatively expensive relative to filling in its slots.  This works for contexts above a complex object representation.  In my experience it doesn't work for floats either. But Cog does not allocate contexts often because of context-to-stack mapping and because the Spur object representation is simple, regular and very quick to synthesize (allocate).

>
> Also a decade back I looked at the new logic and rearranged the code to fast path the creation of commonly used objects. This improved things a bit but didn't really outweight the resulting mess of if statements
>
> Sent from my iPhone
>
>> On Jul 31, 2016, at 08:45, Eliot Miranda <[hidden email]> wrote:
>>
>>
>> Hi Denis, Hi Levente,
>>
>>> On Jul 31, 2016, at 3:41 AM, Levente Uzonyi <[hidden email]> wrote:
>>>
>>> It can't be quicker than basicNew.
>>
>> Depends. Decoding size and encoding header from the class receiver in basicNew might be slower than decoding size and encoding header from the instance receiver in shallowCopy.  But the difference should be small.
>>
>>> It will be quicker than basicNew + manual copying of the fields.
>>
>> Agreed.
>>
>>> And yes, this is quite important to be quick along with the #copyFrom: primitive (168).
>>
>> It's on the to do list then :-)
>>
>>> Levente
>>
>> _,,,^..^,,,_ (phone)
Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

timrowledge


> On 01-08-2016, at 12:58 AM, Eliot Miranda <[hidden email]> wrote:
>
>
> Hi John,
>
>> On Jul 31, 2016, at 7:32 AM, John McIntosh <[hidden email]> wrote:
>>
>> So has anyone thought about pre-allocating a few commonly used objects during idle time?  Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).
>
> Thus only works to the extent that a particular kind if object us allocated all the time and to the extent that synthesizing an object is relatively expensive relative to filling in its slots.  This works for contexts above a complex object representation.  In my experience it doesn't work for floats either. But Cog does not allocate contexts often because of context-to-stack mapping and because the Spur object representation is simple, regular and very quick to synthesize (allocate).

Yeah but with a 64-bit address space we could pre-allocate *millions* of objects of every class! And speculatively initialise them!

Oh, wait; what do you mean we don’t have 17,592,186,044,416Mb ram in our Raspberry Pi’s yet?


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- A one-bit brain with a parity error.


Reply | Threaded
Open this post in threaded view
|

Re: Object basicNew vs object shallowCopy

Denis Kudriashov
 
Hi.
I measure latest VM with shallowCopy support. Copy now is much better:

object := 10@20.
3 timesRepeat: [ Smalltalk garbageCollect ].
result1 := [ Point x: 10 y: 20 ] benchFor: 10 seconds.

3 timesRepeat: [ Smalltalk garbageCollect ].
result2 := [ object shallowCopy ] benchFor: 10 seconds.
{result1. result2}.

"a BenchmarkResult(310,321,301 iterations in 10 seconds 2 milliseconds. 31,025,925 per second) 
a BenchmarkResult(426,311,468 iterations in 10 seconds 3 milliseconds. 42,618,361 per second)" 

But with "Point basicNew" it is almost same:

 "a BenchmarkResult(402,708,088 iterations in 10 seconds 2 milliseconds. 40,262,756 per second) 
a BenchmarkResult(405,145,766 iterations in 10 seconds 3 milliseconds. 40,502,426 per second)"

It also improves my veryDeepCopy test which works better on preSpur VM. (preSpur VM is still better):

m := Morph new.
r2 := [ m veryDeepCopy  ] benchFor: 10 seconds. 

"a BenchmarkResult(34,007 iterations in 10 seconds 3 milliseconds. 3,400 per second)" - no shallow copy optimization

"a BenchmarkResult(43,333 iterations in 10 seconds 1 millisecond. 4,333 per second)" - latest VM with shallow copy.

"a BenchmarkResult(52,985 iterations in 10 seconds 1 millisecond. 5,298 per second)" - preSpur VM



2016-08-01 19:32 GMT+02:00 tim Rowledge <[hidden email]>:


> On 01-08-2016, at 12:58 AM, Eliot Miranda <[hidden email]> wrote:
>
>
> Hi John,
>
>> On Jul 31, 2016, at 7:32 AM, John McIntosh <[hidden email]> wrote:
>>
>> So has anyone thought about pre-allocating a few commonly used objects during idle time?  Then grabbing that object and filling in the actual details when one is needed. The Squeak VM did that for method context (and recycled them).
>
> Thus only works to the extent that a particular kind if object us allocated all the time and to the extent that synthesizing an object is relatively expensive relative to filling in its slots.  This works for contexts above a complex object representation.  In my experience it doesn't work for floats either. But Cog does not allocate contexts often because of context-to-stack mapping and because the Spur object representation is simple, regular and very quick to synthesize (allocate).

Yeah but with a 64-bit address space we could pre-allocate *millions* of objects of every class! And speculatively initialise them!

Oh, wait; what do you mean we don’t have 17,592,186,044,416Mb ram in our Raspberry Pi’s yet?


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Useful random insult:- A one-bit brain with a parity error.