Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Clément Béra
 
I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.

When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case. 2.63 times slower seems to be too much however.

The latest Pharo 64 bits image can be found here: https://ci.inria.fr/pharo/job/Pharo-6.0-Update-Step-3.1-64bits/

The latest Pharo 64 bits VM can be found here: https://bintray.com/opensmalltalk/vm/cog

Best,

On Sun, Feb 5, 2017 at 1:27 PM, Ciprian Teodorov <[hidden email]> wrote:
Hi all,

I'm very happy to see that the 64 bit Pharo vm is progressing.
I've even managed to get a ~6.85 GB heap allocated (see http://bit.ly/2lbp8n6).
This is great!

There seems however to be a small problem with the #shallowCopy message, which is 2.63 times slower on the 64bit VM (image/vm details bellow).

The bench that I used is a simple random graph analysis tool that is intended to do a lot of random memory accesses on big heaps, which is accessible at http://www.smalltalkhub.com/#!/~CipT/PlugMC
In this case I expect the execution time to be dominated by the Set implementation (which is the case with pharo 5 -- see http://bit.ly/2lbzJhd), and not by the array copy (see http://bit.ly/2kvbqvy).

Is this a 64bit limitation, or only a feature "not yet available" ?
Where can I access the latests versions of 64 bit pharo image and vm ?

Image
-----
/Users/ciprian/Downloads/Pharo64/60371-64/Pharo64-60371.image
Pharo6.0
Latest update: #60371
Unnamed

Virtual Machine
---------------
/Users/ciprian/Downloads/Pharo64/Pharo 4.app/Contents/MacOS/Pharo
CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $

Mac OS X built on Jan 27 2017 15:28:14 UTC Compiler: 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)
VMMaker versionString VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $
CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017


Cheers,
--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

timrowledge
 

> On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>
> I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>
> When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.

Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).

Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
We can rescue a hostage or bankrupt a system. Now, what would you like us to do?


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ciprian Teodorov-3
 
Thanks guys I'll will try with the latest version and I'll come back with updates.


On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <[hidden email]> wrote:


> On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>
> I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>
> When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.

Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).

Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
We can rescue a hostage or bankrupt a system. Now, what would you like us to do?





--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ciprian Teodorov-3
 
It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv).
However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy
nor when I try to copy a big array like   (1 to: 100000) asArray copy

However, when I do cmd+. while running my bench the debugger stops in the shallowCopy

is this a debugger thing ? 
or the primitive really fails ? -- which can explain the > 2.6 slowdown

best regards,
cip

On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <[hidden email]> wrote:
Thanks guys I'll will try with the latest version and I'll come back with updates.


On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <[hidden email]> wrote:


> On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>
> I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>
> When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.

Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).

Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
We can rescue a hostage or bankrupt a system. Now, what would you like us to do?





--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro



--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ben Coman
 


On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <[hidden email]> wrote:
 
It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv).
However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy
nor when I try to copy a big array like   (1 to: 100000) asArray copy

However, when I do cmd+. while running my bench the debugger stops in the shallowCopy

is this a debugger thing ? 

To check, can you add a transcript output next line after the primitive pragma?
cheers -ben

 
or the primitive really fails ? -- which can explain the > 2.6 slowdown

best regards,
cip

On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <[hidden email]> wrote:
Thanks guys I'll will try with the latest version and I'll come back with updates.


On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <[hidden email]> wrote:


> On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>
> I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>
> When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.

Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).

Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
We can rescue a hostage or bankrupt a system. Now, what would you like us to do?





--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro



--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ciprian Teodorov-3
 
Thanks Ben,

the <primitive: 148> seems to fail something like 4-5 % with my bench (osx 10.11.6, the latest Pharo/Cog)

    # of copy calls        Failing primitive 148      Failing rate
1710 77 4,50%
3049 133 4,36%
51562 2947 5,72%

and it does not seem to fail at all with something like: 

1 to: 1000 do: [:i |
    (1 to: 100000) asArray copy.
]

cheers

On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman <[hidden email]> wrote:
 


On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <[hidden email]> wrote:
 
It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv).
However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy
nor when I try to copy a big array like   (1 to: 100000) asArray copy

However, when I do cmd+. while running my bench the debugger stops in the shallowCopy

is this a debugger thing ? 

To check, can you add a transcript output next line after the primitive pragma?
cheers -ben

 
or the primitive really fails ? -- which can explain the > 2.6 slowdown

best regards,
cip

On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <[hidden email]> wrote:
Thanks guys I'll will try with the latest version and I'll come back with updates.


On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <[hidden email]> wrote:


> On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>
> I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>
> When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.

Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).

Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
We can rescue a hostage or bankrupt a system. Now, what would you like us to do?





--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro



--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro






--
Dr. Ciprian TEODOROV
Enseignant-chercheur
ENSTA Bretagne

tél : 06 08 54 73 48

mail : [hidden email]
www.teodorov.ro
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ben Coman
In reply to this post by Clément Béra
 
On Sun, Feb 5, 2017 at 9:08 PM, Clément Bera <[hidden email]> wrote:
>
> I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>
> When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case. 2.63 times slower seems to be too much however.
>
> The latest Pharo 64 bits image can be found here: https://ci.inria.fr/pharo/job/Pharo-6.0-Update-Step-3.1-64bits/

That download was failing for me.  That is, at about half way it
completed without error, but the zip was corrupt.
Worked from here... http://files.pharo.org/image/60/

cheers -ben

>
> The latest Pharo 64 bits VM can be found here: https://bintray.com/opensmalltalk/vm/cog
>
> Best,
>
> On Sun, Feb 5, 2017 at 1:27 PM, Ciprian Teodorov <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I'm very happy to see that the 64 bit Pharo vm is progressing.
>> I've even managed to get a ~6.85 GB heap allocated (see http://bit.ly/2lbp8n6).
>> This is great!
>>
>> There seems however to be a small problem with the #shallowCopy message, which is 2.63 times slower on the 64bit VM (image/vm details bellow).
>>
>> The bench that I used is a simple random graph analysis tool that is intended to do a lot of random memory accesses on big heaps, which is accessible at http://www.smalltalkhub.com/#!/~CipT/PlugMC
>> In this case I expect the execution time to be dominated by the Set implementation (which is the case with pharo 5 -- see http://bit.ly/2lbzJhd), and not by the array copy (see http://bit.ly/2kvbqvy).
>>
>> Is this a 64bit limitation, or only a feature "not yet available" ?
>> Where can I access the latests versions of 64 bit pharo image and vm ?
>>
>> Image
>> -----
>> /Users/ciprian/Downloads/Pharo64/60371-64/Pharo64-60371.image
>> Pharo6.0
>> Latest update: #60371
>> Unnamed
>>
>> Virtual Machine
>> ---------------
>> /Users/ciprian/Downloads/Pharo64/Pharo 4.app/Contents/MacOS/Pharo
>> CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
>> StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
>> VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $
>>
>> Mac OS X built on Jan 27 2017 15:28:14 UTC Compiler: 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)
>> VMMaker versionString VM: 201701271449 https://github.com/pharo-project/pharo-vm.git $ Date: Fri Jan 27 15:49:20 2017 +0100 $ Plugins: 201701271449 https://github.com/pharo-project/pharo-vm.git $
>> CoInterpreter * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
>> StackToRegisterMappingCogit * VMMaker.oscog-eem.2111 uuid: 7c02b557-bdcc-4a91-92b1-7fc15f1e8605 Jan 27 2017
>>
>>
>> Cheers,
>> --
>> Dr. Ciprian TEODOROV
>> Enseignant-chercheur
>> ENSTA Bretagne
>>
>> tél : 06 08 54 73 48
>> mail : [hidden email]
>> www.teodorov.ro
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ben Coman
In reply to this post by Ciprian Teodorov-3
 

Try the following experiment.
Copy Object>>shallowCopy to Object>>monitorShallowCopy
and after the pragma add...
     Smalltalk at: #Monitor put: #Failed.


Then in Playground...
    lastfail := 0.
    1 to: 100000 do: [ :n |  
|src copy|
src := Array new: n.
Smalltalk at: #Monitor put: #Succeeded.
copy := src monitorShallowCopy.
(Smalltalk at: #Monitor) == #Failed ifTrue: [
Transcript crShow: n; tab; show: n - lastfail. 
lastfail := n.
].
].

Produces the following interesting result....

RUN1...

65559 65559
67670 2111
67685 15
67700 15
67715 15
67730 15
...
69860 15
69875 15
69890 15
69905 15
72334 2429
72348 14
72362 14
72376 14
72390 14
...
74854 14
74868 14
74882 14
74896 14
77681 2785
77694 13
77707 13
77720 13
77733 13
...
80619 13
80632 13
80645 13
80658 13
83894 3236
83906 12
83918 12
83930 12
83942 12
...
87338 12
87350 12
87362 12
87374 12
91189 3815
91200 11
91211 11
91222 11
91233 11
...
95292 11
95303 11
95314 11
95325 11
99867 4542
99877 10
99887 10
99897 10
99907 10
99917 10
99927 10
99937 10
99947 10
99957 10
99967 10
99977 10
99987 10
99997 10


RUN2...

67660 67660
67675 15
67690 15
67705 15
67720 15
....
69865 15
69880 15
69895 15
69910 15
72324 2414
72338 14
72352 14
72366 14
72380 14
....
74858 14
74872 14
74886 14
74900 14
77685 2785
77698 13
77711 13
77724 13
77737 13
....
80623 13
80636 13
80649 13
80662 13
83898 3236
83910 12
83922 12
83934 12
83946 12
...
87342 12
87354 12
87366 12
87378 12
91193 3815
91204 11
91215 11
91226 11
91237 11
...
95285 11
95296 11
95307 11
95318 11
99871 4553
99881 10
99891 10
99901 10
99911 10
99921 10
99931 10
99941 10
99951 10
99961 10
99971 10
99981 10
99991 10

This is with 
* 60375-64.zip
* cog_win64x64_squeak.stack.spur_201702021058.zip
* Windows 7 Professional SP1

cheers -ben


On Tue, Feb 7, 2017 at 10:15 AM, Ciprian Teodorov <[hidden email]> wrote:

>
>  
> Thanks Ben,
>
> the <primitive: 148> seems to fail something like 4-5 % with my bench (osx 10.11.6, the latest Pharo/Cog)
>
>     # of copy calls        Failing primitive 148      Failing rate
> 1710 77 4,50%
> 3049 133 4,36%
> 51562 2947 5,72%
>
> and it does not seem to fail at all with something like:
>
> 1 to: 1000 do: [:i |
>     (1 to: 100000) asArray copy.
> ]
>
> cheers
>
> On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman <[hidden email]> wrote:
>>
>>  
>>
>>
>> On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <[hidden email]> wrote:
>>>
>>>  
>>> It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv).
>>> However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy
>>> nor when I try to copy a big array like   (1 to: 100000) asArray copy
>>>
>>> However, when I do cmd+. while running my bench the debugger stops in the shallowCopy
>>>
>>> is this a debugger thing ?
>>
>>
>> To check, can you add a transcript output next line after the primitive pragma?
>> cheers -ben
>>
>>  
>>>
>>> or the primitive really fails ? -- which can explain the > 2.6 slowdown
>>>
>>> best regards,
>>> cip
>>>
>>> On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <[hidden email]> wrote:
>>>>
>>>> Thanks guys I'll will try with the latest version and I'll come back with updates.
>>>>
>>>>
>>>> On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <[hidden email]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> > On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>>>>> >
>>>>> > I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>>>>> >
>>>>> > When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.
>>>>>
>>>>> Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
>>>>>
>>>>> Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
>>>>>
>>>>> tim
>>>>> --
>>>>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>>>>> We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Ciprian TEODOROV
>>>> Enseignant-chercheur
>>>> ENSTA Bretagne
>>>>
>>>> tél : 06 08 54 73 48
>>>> mail : [hidden email]
>>>> www.teodorov.ro
>>>
>>>
>>>
>>>
>>> --
>>> Dr. Ciprian TEODOROV
>>> Enseignant-chercheur
>>> ENSTA Bretagne
>>>
>>> tél : 06 08 54 73 48
>>> mail : [hidden email]
>>> www.teodorov.ro
>>>
>>
>>
>
>
>
> --
> Dr. Ciprian TEODOROV
> Enseignant-chercheur
> ENSTA Bretagne
>
> tél : 06 08 54 73 48
> mail : [hidden email]
> www.teodorov.ro
>
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Ciprian Teodorov-3
 
interesting I have a similar behavior with your experiment
with distance variations globally between 16 and 25

cheers,
cip

On Tue, Feb 7, 2017 at 6:51 AM, Ben Coman <[hidden email]> wrote:
 

Try the following experiment.
Copy Object>>shallowCopy to Object>>monitorShallowCopy
and after the pragma add...
     Smalltalk at: #Monitor put: #Failed.


Then in Playground...
    lastfail := 0.
    1 to: 100000 do: [ :n |  
|src copy|
src := Array new: n.
Smalltalk at: #Monitor put: #Succeeded.
copy := src monitorShallowCopy.
(Smalltalk at: #Monitor) == #Failed ifTrue: [
Transcript crShow: n; tab; show: n - lastfail. 
lastfail := n.
].
].

Produces the following interesting result....

RUN1...

65559 65559
67670 2111
67685 15
67700 15
67715 15
67730 15
...
69860 15
69875 15
69890 15
69905 15
72334 2429
72348 14
72362 14
72376 14
72390 14
...
74854 14
74868 14
74882 14
74896 14
77681 2785
77694 13
77707 13
77720 13
77733 13
...
80619 13
80632 13
80645 13
80658 13
83894 3236
83906 12
83918 12
83930 12
83942 12
...
87338 12
87350 12
87362 12
87374 12
91189 3815
91200 11
91211 11
91222 11
91233 11
...
95292 11
95303 11
95314 11
95325 11
99867 4542
99877 10
99887 10
99897 10
99907 10
99917 10
99927 10
99937 10
99947 10
99957 10
99967 10
99977 10
99987 10
99997 10


RUN2...

67660 67660
67675 15
67690 15
67705 15
67720 15
....
69865 15
69880 15
69895 15
69910 15
72324 2414
72338 14
72352 14
72366 14
72380 14
....
74858 14
74872 14
74886 14
74900 14
77685 2785
77698 13
77711 13
77724 13
77737 13
....
80623 13
80636 13
80649 13
80662 13
83898 3236
83910 12
83922 12
83934 12
83946 12
...
87342 12
87354 12
87366 12
87378 12
91193 3815
91204 11
91215 11
91226 11
91237 11
...
95285 11
95296 11
95307 11
95318 11
99871 4553
99881 10
99891 10
99901 10
99911 10
99921 10
99931 10
99941 10
99951 10
99961 10
99971 10
99981 10
99991 10

This is with 
* 60375-64.zip
* cog_win64x64_squeak.stack.spur_201702021058.zip
* Windows 7 Professional SP1

cheers -ben


On Tue, Feb 7, 2017 at 10:15 AM, Ciprian Teodorov <[hidden email]> wrote:

>
>  
> Thanks Ben,
>
> the <primitive: 148> seems to fail something like 4-5 % with my bench (osx 10.11.6, the latest Pharo/Cog)
>
>     # of copy calls        Failing primitive 148      Failing rate
> 1710 77 4,50%
> 3049 133 4,36%
> 51562 2947 5,72%
>
> and it does not seem to fail at all with something like:
>
> 1 to: 1000 do: [:i |
>     (1 to: 100000) asArray copy.
> ]
>
> cheers
>
> On Tue, Feb 7, 2017 at 12:42 AM, Ben Coman <[hidden email]> wrote:
>>
>>  
>>
>>
>> On Tue, Feb 7, 2017 at 3:05 AM, Ciprian Teodorov <[hidden email]> wrote:
>>>
>>>  
>>> It is strange, to me it seems like the <primitive: 148> fails back to the smalltalk implementation (http://bit.ly/2kjYdHv).
>>> However when trying to copy a small array like #(1 2 3 4) copy I cannot step-into the #shallowCopy
>>> nor when I try to copy a big array like   (1 to: 100000) asArray copy
>>>
>>> However, when I do cmd+. while running my bench the debugger stops in the shallowCopy
>>>
>>> is this a debugger thing ?
>>
>>
>> To check, can you add a transcript output next line after the primitive pragma?
>> cheers -ben
>>
>>  
>>>
>>> or the primitive really fails ? -- which can explain the > 2.6 slowdown
>>>
>>> best regards,
>>> cip
>>>
>>> On Mon, Feb 6, 2017 at 7:36 PM, Ciprian Teodorov <[hidden email]> wrote:
>>>>
>>>> Thanks guys I'll will try with the latest version and I'll come back with updates.
>>>>
>>>>
>>>> On Sun, Feb 5, 2017 at 8:25 PM, tim Rowledge <[hidden email]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> > On 05-02-2017, at 5:08 AM, Clément Bera <[hidden email]> wrote:
>>>>> >
>>>>> > I remember there was a discussion about that somewhere but I can't find it. I cc vm-dev they may have a clue.
>>>>> >
>>>>> > When copying a pointer object in 64 bits instead of 32 bits, you need to copy twice many data, so it is going to be slower in any case.
>>>>>
>>>>> Err, not really. Probably. Assuming you have a 64 bit cpu etc, of course. And dependent on details of the memory architecture outside the cpu too - after all many systems do not need the memory chip organisation to match the cpu word size, having multiple lanes, burst read cache loading, even heterogenous regions (I suspect mostly in embedded systems for that, but y’never know).
>>>>>
>>>>> Yes, you’re moving twice as much stuff but it will still be a single read & write per word. After that you’re at the mercy of cache lines, write buffers, chip specs and not to mention the Hamsters.
>>>>>
>>>>> tim
>>>>> --
>>>>> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
>>>>> We can rescue a hostage or bankrupt a system. Now, what would you like us to do?
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Ciprian TEODOROV
>>>> Enseignant-chercheur
>>>> ENSTA Bretagne
>>>>
>>>> tél : 06 08 54 73 48
>>>> mail : [hidden email]
>>>> www.teodorov.ro
>>>
>>>
>>>
>>>
>>> --
>>> Dr. Ciprian TEODOROV
>>> Enseignant-chercheur
>>> ENSTA Bretagne
>>>
>>> tél : 06 08 54 73 48
>>> mail : [hidden email]
>>> www.teodorov.ro
>>>
>>
>>
>
>
>
> --
> Dr. Ciprian TEODOROV
> Enseignant-chercheur
> ENSTA Bretagne
>
> tél : 06 08 54 73 48
> mail : [hidden email]
> www.teodorov.ro
>



Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Bert Freudenberg
 
I would try logging the number of incremental and full GCs along with the failures. Just a hunch (the prim might fail for OOM).

- Bert -

On Tue, Feb 7, 2017 at 2:21 PM, Ciprian Teodorov <[hidden email]> wrote:
 
interesting I have a similar behavior with your experiment
with distance variations globally between 16 and 25

cheers,
cip

On Tue, Feb 7, 2017 at 6:51 AM, Ben Coman <[hidden email]> wrote:
 

Try the following experiment.
Copy Object>>shallowCopy to Object>>monitorShallowCopy
and after the pragma add...
     Smalltalk at: #Monitor put: #Failed.


Then in Playground...
    lastfail := 0.
    1 to: 100000 do: [ :n |  
|src copy|
src := Array new: n.
Smalltalk at: #Monitor put: #Succeeded.
copy := src monitorShallowCopy.
(Smalltalk at: #Monitor) == #Failed ifTrue: [
Transcript crShow: n; tab; show: n - lastfail. 
lastfail := n.
].
].

Produces the following interesting result....

RUN1...

65559 65559
67670 2111
67685 15
67700 15
67715 15
67730 15
...
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Levente Uzonyi
In reply to this post by Ben Coman
 
What's the error code when the primitive fails?

Levente
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] shallowCopy problem on 64 bit Pharo ?

Clément Béra
 
Hi,

I tried to analyse the problem and I think I found the cause and a potential solution.

I have just tried Ben's script in the 32 bits latest VM (Squeak.cog.spur) and at the bottom of the mail are the results in [1]. I modified the script to print the error codes. The ratio are a bit different from 64 bits, but the same pattern is present. The primitive fails once every 40-60 allocations in 32 bits instead of every 10-15 allocations in 64 bits, with every ~15 failures allocations working better for a short while. The primitive always fails for 'insufficient object memory'. 

The allocation strategy is different for objects which size cannot be encoded in 16 bits (in our case, array larger than 65535 fields). Large objects are directly allocated in old space. The failures in shallowCopy happen in this case. I believe the case where many large objects are allocated in a row is not really optimised because it supposed to be uncommon. If it's common in someone's usecase, I am pretty sure we can do something about it.

Because the memory is in bytes and array fields are twice bigger in 64 bits, I would expect the failures to be twice more frequent in 64 bits than 32 bits. They seem to be 4 times more frequent, but different persons did the 64 bits measurements on different machines, so it could be that other side-effects require to be considered.

One solution I see is the following (Pharo version, in Squeak use directly vmParameterAt:) :

coef := 2.
Smalltalk vm parameterAt: 25 put: (Smalltalk vm parameterAt: 25) * coef.
Smalltalk vm parameterAt: 24 put: (Smalltalk vm parameterAt: 24) * coef.

Basically, I change the old space heuristics to allocate bigger segments and not to shrink too aggressively.  

With a coef of 2, I see the primitive failing once every 58-87 times instead of once every 40-60 allocations.
With a coef of 10, I see the primitive failing once every 350-700 allocations. The results for coef 10 are in [2] at the bottom of the mail.

Obviously with these settings the image is using a bit more RAM, but I guess in the use-case of Ciprian where images are 6.8Gb large it does not really matter to waste a dozen extra Mb.

Coef 2 may lead to a waste of ~15Mb
Coef 10 may lead to a waste of ~150Mb

I don't think there is a generic magic solution for 64 bits. We could consider having twice bigger segments by default in 64 bits ? I don't know if it makes sense.

I have on my TODO list to build a GC object for Pharo (normally Squeak-compatible) to provide convenient APIs and documentation on how to adapt the GC policy in Spur for both growing and large heaps. Hopefully I will do that around June.

[1]
65631 65631 #'insufficient object memory'
65689 58 #'insufficient object memory'
65747 58 #'insufficient object memory'
...
65979 58 #'insufficient object memory'
66616 637 #'insufficient object memory'
66673 57 #'insufficient object memory'
66730 57 #'insufficient object memory'
...
67243 57 #'insufficient object memory'
67698 455 #'insufficient object memory'
67754 56 #'insufficient object memory'
67810 56 #'insufficient object memory'
...
68538 56 #'insufficient object memory'
68817 279 #'insufficient object memory'
68872 55 #'insufficient object memory'
...
99860 38 #'insufficient object memory'

[2]
66720 66720 #'insufficient object memory'
68303 1583 #'insufficient object memory'
69850 1547 #'insufficient object memory'
70231 381 #'insufficient object memory'
70610 379 #'insufficient object memory'
71363 753 #'insufficient object memory'
72107 744 #'insufficient object memory'
72844 737 #'insufficient object memory'
73574 730 #'insufficient object memory'
74296 722 #'insufficient object memory'
74654 358 #'insufficient object memory'
75011 357 #'insufficient object memory'
75719 708 #'insufficient object memory'
76071 352 #'insufficient object memory'
...
98404 816 #'insufficient object memory'
98945 541 #'insufficient object memory'
99214 269 #'insufficient object memory'


On Tue, Feb 7, 2017 at 7:58 PM, Levente Uzonyi <[hidden email]> wrote:

What's the error code when the primitive fails?

Levente