[Pharo-dev] Exploring Heap Size Limits

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Pharo-dev] Exploring Heap Size Limits

Sven Van Caekenberghe-2
Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)


Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
  data := Array new: count streamContents: [ :out |
    count timesRepeat: [
      out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
  show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
  cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
  3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.


And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
        old 19,213,624 bytes (76.60000000000001%)
        young 5,384 bytes (0.0%)
        used 19,219,008 bytes (76.60000000000001%)
        free 5,857,700 bytes (23.400000000000002%)
GCs 10 (12ms between GCs)
        full 1 totalling 30ms (25.0% uptime), avg 30.0ms
        incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
        tenures 0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime 0h0m3s
memory 546,746,364 bytes
        old 543,520,476 bytes (99.4%)
        young 7,184 bytes (0.0%)
        used 543,527,660 bytes (99.4%)
        free 3,218,704 bytes (0.6000000000000001%)
GCs 356 (9ms between GCs)
        full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
        incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
        tenures 0
Since last view 346 (9ms between GCs)
        uptime 3.1s
        full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
        incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
        tenures 0


It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime 0h0m0s
memory 25,076,716 bytes
        old 19,213,480 bytes (76.60000000000001%)
        young 5,384 bytes (0.0%)
        used 19,218,864 bytes (76.60000000000001%)
        free 5,857,852 bytes (23.400000000000002%)
GCs 12 (13ms between GCs)
        full 2 totalling 56ms (37.1% uptime), avg 28.0ms
        incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
        tenures 0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime 0h0m14s
memory 533,946,384 bytes
        old 531,232,232 bytes (99.5%)
        young 7,644 bytes (0.0%)
        used 531,239,876 bytes (99.5%)
        free 2,706,508 bytes (0.5%)
GCs 1,011 (14ms between GCs)
        full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
        incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
        tenures 0
Since last view 999 (14ms between GCs)
        uptime 13.9s
        full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
        incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
        tenures 0


So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
        old 19,213,592 bytes (76.60000000000001%)
        young 5,384 bytes (0.0%)
        used 19,218,976 bytes (76.60000000000001%)
        free 5,857,732 bytes (23.400000000000002%)
GCs 12 (12ms between GCs)
        full 2 totalling 55ms (37.9% uptime), avg 27.5ms
        incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
        tenures 0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime 0h0m35s
memory -783,340,292 bytes
        old -786,061,544 bytes (100.30000000000001%)
        young 7,460 bytes (0.0%)
        used -786,054,084 bytes (100.30000000000001%)
        free 2,713,792 bytes (-0.30000000000000004%)
GCs 2,572 (14ms between GCs)
        full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
        incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
        tenures 0
Since last view 2,560 (14ms between GCs)
        uptime 35.6s
        full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
        incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
        tenures 0


The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.


Sven


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

philippeback
On OSX Lion w/ 8GB RAM, the first call gives me:

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./run.sh
uptime 0h0m0s
memory 25,004,440 bytes
old 19,213,728 bytes (76.80000000000001%)
young 5,384 bytes (0.0%)
used 19,219,112 bytes (76.9%)
free 5,785,328 bytes (23.1%)
GCs 5 (127ms between GCs)
full 1 totalling 72ms (11.4% uptime), avg 72.0ms
incr 4 totalling 11ms (1.7000000000000002% uptime), avg 2.8000000000000003ms
tenures 0

allocating: 512 times 1024000 bytes

===============================================================================
Notice: Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st
===============================================================================
Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st
==== Startup Error: OutOfMemory
ByteArray class(Behavior)>>basicNew:
ByteArray class(Behavior)>>new:
UndefinedObject>>DoIt in Block: [t6...
SmallInteger(Integer)>>timesRepeat:
UndefinedObject>>DoIt in Block: [:t6 | t1...
Array class(SequenceableCollection class)>>new:streamContents:
UndefinedObject>>DoIt in Block: [t4 := Array...
Time class>>millisecondsToRun:
BlockClosure>>timeToRun
UndefinedObject>>DoIt
Compiler>>evaluate:in:to:notifying:ifFail:logged:
Compiler class>>evaluate:for:notifying:logged:
Compiler class>>evaluate:for:logged:
Compiler class>>evaluate:logged:
DoItDeclaration>>import
CodeImporter>>evaluate in Block: [:decl | value := decl import]
OrderedCollection>>do:
CodeImporter>>evaluate
BasicCodeLoader>>installSourceFile: in Block: [codeImporter evaluate]
BlockClosure>>on:do:
BasicCodeLoader>>handleErrorsDuring:reference:
BasicCodeLoader>>installSourceFile:
BasicCodeLoader>>installSourceFiles in Block: [:reference | self installSourceFile: reference]
OrderedCollection>>do:
BasicCodeLoader>>installSourceFiles in Block: [sourceFiles...
BlockClosure>>ensure:
BasicCodeLoader>>installSourceFiles
BasicCodeLoader>>activate
BasicCodeLoader class(CommandLineHandler class)>>activateWith:
DefaultCommandLineHandler>>handleSubcommand
Got startup errors:
    OutOfMemory


Now, I did a purge.

Same results.

Need to pass a parameter or something?

This is on a fresh 2.0 + vmLatest

Phil
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

philippeback
In reply to this post by Sven Van Caekenberghe-2
I did some more experiments and that's the most I can get from the script, no matter how much free memory there is on the machine.
Adding ./pharo --memory 2000m ... has no effect.


[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 501 1024000
uptime 0h0m0s
memory 25,004,444 bytes
old 19,213,688 bytes (76.80000000000001%)
young 5,384 bytes (0.0%)
used 19,219,072 bytes (76.9%)
free 5,785,372 bytes (23.1%)
GCs 5 (62ms between GCs)
full 1 totalling 76ms (24.5% uptime), avg 76.0ms
incr 4 totalling 10ms (3.2% uptime), avg 2.5ms
tenures 0

allocating: 501 times 1024000 bytes
allocated: 513.02 MB
time to run 0:00:00:06.27
3 times GC
time to run 0:00:00:00.494
uptime 0h0m7s
memory 534,974,360 bytes
old 532,256,456 bytes (99.5%)
young 7,092 bytes (0.0%)
used 532,263,548 bytes (99.5%)
free 2,710,812 bytes (0.5%)
GCs 344 (21ms between GCs)
full 87 totalling 5,822ms (82.2% uptime), avg 66.9ms
incr 257 totalling 659ms (9.3% uptime), avg 2.6ms
tenures 0
Since last view 339 (20ms between GCs)
uptime 6.800000000000001s
full 86 totalling 5,746ms (84.9% uptime), avg 66.8ms
incr 253 totalling 649ms (9.600000000000001% uptime), avg 2.6ms
tenures 0
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

Sven Van Caekenberghe-2

On 29 Jun 2013, at 13:18, [hidden email] wrote:

> I did some more experiments and that's the most I can get from the script, no matter how much free memory there is on the machine.
> Adding ./pharo --memory 2000m ... has no effect.

Yes it seems that it is not possible to allocate more than half a GB on Mac OS X. The VM's are obviously different on each platform.

BTW, I didn't know that get.pharo.org works so well on OS X too ;-)

Now, for server production usage only Linux counts anyway.

On Mac OS X, stability is way more important to me than heap size.

But it would be nice to hear something from our VM experts.

Sven


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

Jan Vrany
In reply to this post by Sven Van Caekenberghe-2
Hi,

for 32bit application you can never allocate whole 4GB as some memory
has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock
kernels come with 2/2 split.

Jan





On 28/06/13 23:02, Sven Van Caekenberghe wrote:

> Hi,
>
> I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.
>
> Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.
>
> Anyway, my goal is to start the debate around this topic ;-)
>
>
> Here is the test script:
>
> t3@coolermaster:smalltalk$ cat memtest.st
>
> NonInteractiveTranscript stdout install.
>
> !
>
> Transcript show: 'memtest.st'; cr.
> Smalltalk garbageCollect.
> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>
> !
>
> | count size data timeToRun |
> count := Smalltalk commandLine arguments first asInteger.
> size := Smalltalk commandLine arguments second asInteger.
> Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
> timeToRun := [
>    data := Array new: count streamContents: [ :out |
>      count timesRepeat: [
>        out nextPut: (ByteArray new: size) ] ].
> ] timeToRun.
> Transcript
>    show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
>    cr.
> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
> Transcript show: '3 times GC'; cr.
> timeToRun := [
>    3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>
> !
>
> SmalltalkImage current quitPrimitive.
>
>
> And here is a normal result running get.pharo.org/20+vm
>
> $ ./pharo Pharo.image memtest.st 512 1024000
>
> memtest.st
> uptime 0h0m0s
> memory 25,076,708 bytes
> old 19,213,624 bytes (76.60000000000001%)
> young 5,384 bytes (0.0%)
> used 19,219,008 bytes (76.60000000000001%)
> free 5,857,700 bytes (23.400000000000002%)
> GCs 10 (12ms between GCs)
> full 1 totalling 30ms (25.0% uptime), avg 30.0ms
> incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
> tenures 0
>
> allocating: 512 times 1024000 bytes
> allocated: 524.29 MB
> time to run 0:00:00:02.852
> 3 times GC
> time to run 0:00:00:00.196
> uptime 0h0m3s
> memory 546,746,364 bytes
> old 543,520,476 bytes (99.4%)
> young 7,184 bytes (0.0%)
> used 543,527,660 bytes (99.4%)
> free 3,218,704 bytes (0.6000000000000001%)
> GCs 356 (9ms between GCs)
> full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
> incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
> tenures 0
> Since last view 346 (9ms between GCs)
> uptime 3.1s
> full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
> incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
> tenures 0
>
>
> It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.
>
> You can play a bit with the 2 parameters to generate different datasets.
>
> Not changing any VM parameters, you can successfully do something like
>
> $ ./pharo Pharo.image memtest.st 1000 1024000
>
> or
>
> $ ./pharo Pharo.image memtest.st 1024000 1000
>
> thus allocating about 1GB.
>
> Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):
>
> $ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000
>
> memtest.st
> uptime 0h0m0s
> memory 25,076,716 bytes
> old 19,213,480 bytes (76.60000000000001%)
> young 5,384 bytes (0.0%)
> used 19,218,864 bytes (76.60000000000001%)
> free 5,857,852 bytes (23.400000000000002%)
> GCs 12 (13ms between GCs)
> full 2 totalling 56ms (37.1% uptime), avg 28.0ms
> incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
> tenures 0
>
> allocating: 500 times 1024000 bytes
> allocated: 512.00 MB
> time to run 0:00:00:13.66
> 3 times GC
> time to run 0:00:00:00.192
> uptime 0h0m14s
> memory 533,946,384 bytes
> old 531,232,232 bytes (99.5%)
> young 7,644 bytes (0.0%)
> used 531,239,876 bytes (99.5%)
> free 2,706,508 bytes (0.5%)
> GCs 1,011 (14ms between GCs)
> full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
> incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
> tenures 0
> Since last view 999 (14ms between GCs)
> uptime 13.9s
> full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
> incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
> tenures 0
>
>
> So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.
>
> Increasing the heap size to 1792 Mb gives the following result:
>
> t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576
>
> memtest.st
> uptime 0h0m0s
> memory 25,076,708 bytes
> old 19,213,592 bytes (76.60000000000001%)
> young 5,384 bytes (0.0%)
> used 19,218,976 bytes (76.60000000000001%)
> free 5,857,732 bytes (23.400000000000002%)
> GCs 12 (12ms between GCs)
> full 2 totalling 55ms (37.9% uptime), avg 27.5ms
> incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
> tenures 0
>
> allocating: 1280 times 1048576 bytes
> allocated: 1.34 GB
> time to run 0:00:00:35.258
> 3 times GC
> time to run 0:00:00:00.37
> uptime 0h0m35s
> memory -783,340,292 bytes
> old -786,061,544 bytes (100.30000000000001%)
> young 7,460 bytes (0.0%)
> used -786,054,084 bytes (100.30000000000001%)
> free 2,713,792 bytes (-0.30000000000000004%)
> GCs 2,572 (14ms between GCs)
> full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
> incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
> tenures 0
> Since last view 2,560 (14ms between GCs)
> uptime 35.6s
> full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
> incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
> tenures 0
>
>
> The allocation takes a lot longer, but the GC times are stable.
>
> I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.
>
> Again, I am not complaining, just exploring/wondering.
>
>
> Sven
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

Sven Van Caekenberghe-2
Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

> Hi,
>
> for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
> the actual code must be somewhere in the memory as well as the stack.
> Usually the real limit for 32bit application is somewhere like 1.8GB for
> heap. With 3/1 split you can have one more gig, but for linux that would
> require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.
>
> Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

Sven

> On 28/06/13 23:02, Sven Van Caekenberghe wrote:
>> Hi,
>>
>> I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.
>>
>> Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.
>>
>> Anyway, my goal is to start the debate around this topic ;-)
>>
>>
>> Here is the test script:
>>
>> t3@coolermaster:smalltalk$ cat memtest.st
>>
>> NonInteractiveTranscript stdout install.
>>
>> !
>>
>> Transcript show: 'memtest.st'; cr.
>> Smalltalk garbageCollect.
>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>
>> !
>>
>> | count size data timeToRun |
>> count := Smalltalk commandLine arguments first asInteger.
>> size := Smalltalk commandLine arguments second asInteger.
>> Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
>> timeToRun := [
>>   data := Array new: count streamContents: [ :out |
>>     count timesRepeat: [
>>       out nextPut: (ByteArray new: size) ] ].
>> ] timeToRun.
>> Transcript
>>   show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
>>   cr.
>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>> Transcript show: '3 times GC'; cr.
>> timeToRun := [
>>   3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>
>> !
>>
>> SmalltalkImage current quitPrimitive.
>>
>>
>> And here is a normal result running get.pharo.org/20+vm
>>
>> $ ./pharo Pharo.image memtest.st 512 1024000
>>
>> memtest.st
>> uptime 0h0m0s
>> memory 25,076,708 bytes
>> old 19,213,624 bytes (76.60000000000001%)
>> young 5,384 bytes (0.0%)
>> used 19,219,008 bytes (76.60000000000001%)
>> free 5,857,700 bytes (23.400000000000002%)
>> GCs 10 (12ms between GCs)
>> full 1 totalling 30ms (25.0% uptime), avg 30.0ms
>> incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
>> tenures 0
>>
>> allocating: 512 times 1024000 bytes
>> allocated: 524.29 MB
>> time to run 0:00:00:02.852
>> 3 times GC
>> time to run 0:00:00:00.196
>> uptime 0h0m3s
>> memory 546,746,364 bytes
>> old 543,520,476 bytes (99.4%)
>> young 7,184 bytes (0.0%)
>> used 543,527,660 bytes (99.4%)
>> free 3,218,704 bytes (0.6000000000000001%)
>> GCs 356 (9ms between GCs)
>> full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
>> incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
>> tenures 0
>> Since last view 346 (9ms between GCs)
>> uptime 3.1s
>> full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
>> incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
>> tenures 0
>>
>>
>> It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.
>>
>> You can play a bit with the 2 parameters to generate different datasets.
>>
>> Not changing any VM parameters, you can successfully do something like
>>
>> $ ./pharo Pharo.image memtest.st 1000 1024000
>>
>> or
>>
>> $ ./pharo Pharo.image memtest.st 1024000 1000
>>
>> thus allocating about 1GB.
>>
>> Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):
>>
>> $ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000
>>
>> memtest.st
>> uptime 0h0m0s
>> memory 25,076,716 bytes
>> old 19,213,480 bytes (76.60000000000001%)
>> young 5,384 bytes (0.0%)
>> used 19,218,864 bytes (76.60000000000001%)
>> free 5,857,852 bytes (23.400000000000002%)
>> GCs 12 (13ms between GCs)
>> full 2 totalling 56ms (37.1% uptime), avg 28.0ms
>> incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
>> tenures 0
>>
>> allocating: 500 times 1024000 bytes
>> allocated: 512.00 MB
>> time to run 0:00:00:13.66
>> 3 times GC
>> time to run 0:00:00:00.192
>> uptime 0h0m14s
>> memory 533,946,384 bytes
>> old 531,232,232 bytes (99.5%)
>> young 7,644 bytes (0.0%)
>> used 531,239,876 bytes (99.5%)
>> free 2,706,508 bytes (0.5%)
>> GCs 1,011 (14ms between GCs)
>> full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
>> incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
>> tenures 0
>> Since last view 999 (14ms between GCs)
>> uptime 13.9s
>> full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
>> incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
>> tenures 0
>>
>>
>> So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.
>>
>> Increasing the heap size to 1792 Mb gives the following result:
>>
>> t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576
>>
>> memtest.st
>> uptime 0h0m0s
>> memory 25,076,708 bytes
>> old 19,213,592 bytes (76.60000000000001%)
>> young 5,384 bytes (0.0%)
>> used 19,218,976 bytes (76.60000000000001%)
>> free 5,857,732 bytes (23.400000000000002%)
>> GCs 12 (12ms between GCs)
>> full 2 totalling 55ms (37.9% uptime), avg 27.5ms
>> incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
>> tenures 0
>>
>> allocating: 1280 times 1048576 bytes
>> allocated: 1.34 GB
>> time to run 0:00:00:35.258
>> 3 times GC
>> time to run 0:00:00:00.37
>> uptime 0h0m35s
>> memory -783,340,292 bytes
>> old -786,061,544 bytes (100.30000000000001%)
>> young 7,460 bytes (0.0%)
>> used -786,054,084 bytes (100.30000000000001%)
>> free 2,713,792 bytes (-0.30000000000000004%)
>> GCs 2,572 (14ms between GCs)
>> full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
>> incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
>> tenures 0
>> Since last view 2,560 (14ms between GCs)
>> uptime 35.6s
>> full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
>> incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
>> tenures 0
>>
>>
>> The allocation takes a lot longer, but the GC times are stable.
>>
>> I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.
>>
>> Again, I am not complaining, just exploring/wondering.
>>
>>
>> Sven
>>
>>
>>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

Jan Vrany
On 29/06/13 17:45, Sven Van Caekenberghe wrote:

> Hi Jan,
>
> On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:
>
>> Hi,
>>
>> for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
>> the actual code must be somewhere in the memory as well as the stack.
>> Usually the real limit for 32bit application is somewhere like 1.8GB for
>> heap. With 3/1 split you can have one more gig, but for linux that would
>> require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.
>>
>> Jan
>
> Thanks for the reply. Sounds pretty reasonable.
>
> Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which
need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory
above 0x7FFFFFFF) Then the kernel install its own, kernel page table
which maps some memory the very same physical page which was mapped in
user process's page table to the address where the data were just
copied. That's how the data are transferred between user space and
kernel space. Or, at least that's my understanding how it's done :-) I
was never particularly good at these low-level things.

Best, Jan

>
> Sven
>
>> On 28/06/13 23:02, Sven Van Caekenberghe wrote:
>>> Hi,
>>>
>>> I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.
>>>
>>> Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.
>>>
>>> Anyway, my goal is to start the debate around this topic ;-)
>>>
>>>
>>> Here is the test script:
>>>
>>> t3@coolermaster:smalltalk$ cat memtest.st
>>>
>>> NonInteractiveTranscript stdout install.
>>>
>>> !
>>>
>>> Transcript show: 'memtest.st'; cr.
>>> Smalltalk garbageCollect.
>>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>>
>>> !
>>>
>>> | count size data timeToRun |
>>> count := Smalltalk commandLine arguments first asInteger.
>>> size := Smalltalk commandLine arguments second asInteger.
>>> Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
>>> timeToRun := [
>>>    data := Array new: count streamContents: [ :out |
>>>      count timesRepeat: [
>>>        out nextPut: (ByteArray new: size) ] ].
>>> ] timeToRun.
>>> Transcript
>>>    show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
>>>    cr.
>>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>>> Transcript show: '3 times GC'; cr.
>>> timeToRun := [
>>>    3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
>>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>>
>>> !
>>>
>>> SmalltalkImage current quitPrimitive.
>>>
>>>
>>> And here is a normal result running get.pharo.org/20+vm
>>>
>>> $ ./pharo Pharo.image memtest.st 512 1024000
>>>
>>> memtest.st
>>> uptime 0h0m0s
>>> memory 25,076,708 bytes
>>> old 19,213,624 bytes (76.60000000000001%)
>>> young 5,384 bytes (0.0%)
>>> used 19,219,008 bytes (76.60000000000001%)
>>> free 5,857,700 bytes (23.400000000000002%)
>>> GCs 10 (12ms between GCs)
>>> full 1 totalling 30ms (25.0% uptime), avg 30.0ms
>>> incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
>>> tenures 0
>>>
>>> allocating: 512 times 1024000 bytes
>>> allocated: 524.29 MB
>>> time to run 0:00:00:02.852
>>> 3 times GC
>>> time to run 0:00:00:00.196
>>> uptime 0h0m3s
>>> memory 546,746,364 bytes
>>> old 543,520,476 bytes (99.4%)
>>> young 7,184 bytes (0.0%)
>>> used 543,527,660 bytes (99.4%)
>>> free 3,218,704 bytes (0.6000000000000001%)
>>> GCs 356 (9ms between GCs)
>>> full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
>>> incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
>>> tenures 0
>>> Since last view 346 (9ms between GCs)
>>> uptime 3.1s
>>> full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
>>> incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
>>> tenures 0
>>>
>>>
>>> It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.
>>>
>>> You can play a bit with the 2 parameters to generate different datasets.
>>>
>>> Not changing any VM parameters, you can successfully do something like
>>>
>>> $ ./pharo Pharo.image memtest.st 1000 1024000
>>>
>>> or
>>>
>>> $ ./pharo Pharo.image memtest.st 1024000 1000
>>>
>>> thus allocating about 1GB.
>>>
>>> Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):
>>>
>>> $ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000
>>>
>>> memtest.st
>>> uptime 0h0m0s
>>> memory 25,076,716 bytes
>>> old 19,213,480 bytes (76.60000000000001%)
>>> young 5,384 bytes (0.0%)
>>> used 19,218,864 bytes (76.60000000000001%)
>>> free 5,857,852 bytes (23.400000000000002%)
>>> GCs 12 (13ms between GCs)
>>> full 2 totalling 56ms (37.1% uptime), avg 28.0ms
>>> incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
>>> tenures 0
>>>
>>> allocating: 500 times 1024000 bytes
>>> allocated: 512.00 MB
>>> time to run 0:00:00:13.66
>>> 3 times GC
>>> time to run 0:00:00:00.192
>>> uptime 0h0m14s
>>> memory 533,946,384 bytes
>>> old 531,232,232 bytes (99.5%)
>>> young 7,644 bytes (0.0%)
>>> used 531,239,876 bytes (99.5%)
>>> free 2,706,508 bytes (0.5%)
>>> GCs 1,011 (14ms between GCs)
>>> full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
>>> incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
>>> tenures 0
>>> Since last view 999 (14ms between GCs)
>>> uptime 13.9s
>>> full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
>>> incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
>>> tenures 0
>>>
>>>
>>> So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.
>>>
>>> Increasing the heap size to 1792 Mb gives the following result:
>>>
>>> t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576
>>>
>>> memtest.st
>>> uptime 0h0m0s
>>> memory 25,076,708 bytes
>>> old 19,213,592 bytes (76.60000000000001%)
>>> young 5,384 bytes (0.0%)
>>> used 19,218,976 bytes (76.60000000000001%)
>>> free 5,857,732 bytes (23.400000000000002%)
>>> GCs 12 (12ms between GCs)
>>> full 2 totalling 55ms (37.9% uptime), avg 27.5ms
>>> incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
>>> tenures 0
>>>
>>> allocating: 1280 times 1048576 bytes
>>> allocated: 1.34 GB
>>> time to run 0:00:00:35.258
>>> 3 times GC
>>> time to run 0:00:00:00.37
>>> uptime 0h0m35s
>>> memory -783,340,292 bytes
>>> old -786,061,544 bytes (100.30000000000001%)
>>> young 7,460 bytes (0.0%)
>>> used -786,054,084 bytes (100.30000000000001%)
>>> free 2,713,792 bytes (-0.30000000000000004%)
>>> GCs 2,572 (14ms between GCs)
>>> full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
>>> incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
>>> tenures 0
>>> Since last view 2,560 (14ms between GCs)
>>> uptime 35.6s
>>> full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
>>> incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
>>> tenures 0
>>>
>>>
>>> The allocation takes a lot longer, but the GC times are stable.
>>>
>>> I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.
>>>
>>> Again, I am not complaining, just exploring/wondering.
>>>
>>>
>>> Sven
>>>
>>>
>>>
>>
>>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

philippeback

I've been changing the value for the heap in the info.plist on OSX and it is indeed possible to go higher.

The default was 536870912 (512MB or so)

Inline image 1

I changed it to 1536870912 and it indeed allows to go higher.

Inline image 2


Here is a sample:

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1000 1024000
uptime 0h0m0s
memory 25,004,444 bytes
old 19,213,696 bytes (76.80000000000001%)
young 5,384 bytes (0.0%)
used 19,219,080 bytes (76.9%)
free 5,785,364 bytes (23.1%)
GCs 7 (101ms between GCs)
full 2 totalling 124ms (17.6% uptime), avg 62.0ms
incr 5 totalling 8ms (1.1% uptime), avg 1.6ms
tenures 0

allocating: 1000 times 1024000 bytes
allocated: 1.02 GB
time to run 0:00:00:59.776
3 times GC
time to run 0:00:00:00.73
uptime 0h1m1s
memory 1,045,961,948 bytes
old 1,043,240,448 bytes (99.7%)
young 7,552 bytes (0.0%)
used 1,043,248,000 bytes (99.7%)
free 2,713,948 bytes (0.30000000000000004%)
GCs 2,006 (31ms between GCs)
full 1001 totalling 60,190ms (98.30000000000001% uptime), avg 60.1ms
incr 1005 totalling 143ms (0.2% uptime), avg 0.1ms
tenures 0
Since last view 1,999 (30ms between GCs)
uptime 60.5s
full 999 totalling 60,066ms (99.30000000000001% uptime), avg 60.1ms
incr 1000 totalling 135ms (0.2% uptime), avg 0.1ms
tenures 0

Phil


On Sat, Jun 29, 2013 at 8:22 PM, Jan Vrany <[hidden email]> wrote:
On 29/06/13 17:45, Sven Van Caekenberghe wrote:
Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan


Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:
Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)


Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
   data := Array new: count streamContents: [ :out |
     count timesRepeat: [
       out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
   show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
   cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
   3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.


And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,624 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,219,008 bytes (76.60000000000001%)
        free            5,857,700 bytes (23.400000000000002%)
GCs                             10 (12ms between GCs)
        full                    1 totalling 30ms (25.0% uptime), avg 30.0ms
        incr            9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
        tenures         0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime                  0h0m3s
memory                  546,746,364 bytes
        old                     543,520,476 bytes (99.4%)
        young           7,184 bytes (0.0%)
        used            543,527,660 bytes (99.4%)
        free            3,218,704 bytes (0.6000000000000001%)
GCs                             356 (9ms between GCs)
        full                    89 totalling 2,743ms (86.4% uptime), avg 30.8ms
        incr            267 totalling 223ms (7.0% uptime), avg 0.8ms
        tenures         0
Since last view 346 (9ms between GCs)
        uptime          3.1s
        full                    88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
        incr            258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
        tenures         0


It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,716 bytes
        old                     19,213,480 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,864 bytes (76.60000000000001%)
        free            5,857,852 bytes (23.400000000000002%)
GCs                             12 (13ms between GCs)
        full                    2 totalling 56ms (37.1% uptime), avg 28.0ms
        incr            10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime                  0h0m14s
memory                  533,946,384 bytes
        old                     531,232,232 bytes (99.5%)
        young           7,644 bytes (0.0%)
        used            531,239,876 bytes (99.5%)
        free            2,706,508 bytes (0.5%)
GCs                             1,011 (14ms between GCs)
        full                    501 totalling 13,819ms (98.7% uptime), avg 27.6ms
        incr            510 totalling 35ms (0.2% uptime), avg 0.1ms
        tenures         0
Since last view 999 (14ms between GCs)
        uptime          13.9s
        full                    499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
        incr            500 totalling 29ms (0.2% uptime), avg 0.1ms
        tenures         0


So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,592 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,976 bytes (76.60000000000001%)
        free            5,857,732 bytes (23.400000000000002%)
GCs                             12 (12ms between GCs)
        full                    2 totalling 55ms (37.9% uptime), avg 27.5ms
        incr            10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime                  0h0m35s
memory                  -783,340,292 bytes
        old                     -786,061,544 bytes (100.30000000000001%)
        young           7,460 bytes (0.0%)
        used            -786,054,084 bytes (100.30000000000001%)
        free            2,713,792 bytes (-0.30000000000000004%)
GCs                             2,572 (14ms between GCs)
        full                    1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
        incr            1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
        tenures         0
Since last view 2,560 (14ms between GCs)
        uptime          35.6s
        full                    1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
        incr            1281 totalling 87ms (0.2% uptime), avg 0.1ms
        tenures         0


The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.


Sven











Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

philippeback
In reply to this post by Jan Vrany
And at one point, even if there was free memory, things went south.

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 500 1024000
errno 12
mmap failed: Cannot allocate memory
[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1 1024000
errno 12
mmap failed: Cannot allocate memory

Phil

---
Philippe Back
Dramatic Performance Improvements
Mob: +32(0) 478 650 140 | Fax: +32 (0) 70 408 027
Blog: http://philippeback.be | Twitter: @philippeback

High Octane SPRL
rue cour Boisacq 101 | 1301 Bierges | Belgium

Featured on the Software Process and Measurement Cast
Sparx Systems Enterprise Architect and Ability Engineering EADocX Value Added Reseller
 



On Sat, Jun 29, 2013 at 8:22 PM, Jan Vrany <[hidden email]> wrote:
On 29/06/13 17:45, Sven Van Caekenberghe wrote:
Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan


Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:
Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)


Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
   data := Array new: count streamContents: [ :out |
     count timesRepeat: [
       out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
   show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
   cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
   3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.


And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,624 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,219,008 bytes (76.60000000000001%)
        free            5,857,700 bytes (23.400000000000002%)
GCs                             10 (12ms between GCs)
        full                    1 totalling 30ms (25.0% uptime), avg 30.0ms
        incr            9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
        tenures         0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime                  0h0m3s
memory                  546,746,364 bytes
        old                     543,520,476 bytes (99.4%)
        young           7,184 bytes (0.0%)
        used            543,527,660 bytes (99.4%)
        free            3,218,704 bytes (0.6000000000000001%)
GCs                             356 (9ms between GCs)
        full                    89 totalling 2,743ms (86.4% uptime), avg 30.8ms
        incr            267 totalling 223ms (7.0% uptime), avg 0.8ms
        tenures         0
Since last view 346 (9ms between GCs)
        uptime          3.1s
        full                    88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
        incr            258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
        tenures         0


It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,716 bytes
        old                     19,213,480 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,864 bytes (76.60000000000001%)
        free            5,857,852 bytes (23.400000000000002%)
GCs                             12 (13ms between GCs)
        full                    2 totalling 56ms (37.1% uptime), avg 28.0ms
        incr            10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime                  0h0m14s
memory                  533,946,384 bytes
        old                     531,232,232 bytes (99.5%)
        young           7,644 bytes (0.0%)
        used            531,239,876 bytes (99.5%)
        free            2,706,508 bytes (0.5%)
GCs                             1,011 (14ms between GCs)
        full                    501 totalling 13,819ms (98.7% uptime), avg 27.6ms
        incr            510 totalling 35ms (0.2% uptime), avg 0.1ms
        tenures         0
Since last view 999 (14ms between GCs)
        uptime          13.9s
        full                    499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
        incr            500 totalling 29ms (0.2% uptime), avg 0.1ms
        tenures         0


So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,592 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,976 bytes (76.60000000000001%)
        free            5,857,732 bytes (23.400000000000002%)
GCs                             12 (12ms between GCs)
        full                    2 totalling 55ms (37.9% uptime), avg 27.5ms
        incr            10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime                  0h0m35s
memory                  -783,340,292 bytes
        old                     -786,061,544 bytes (100.30000000000001%)
        young           7,460 bytes (0.0%)
        used            -786,054,084 bytes (100.30000000000001%)
        free            2,713,792 bytes (-0.30000000000000004%)
GCs                             2,572 (14ms between GCs)
        full                    1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
        incr            1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
        tenures         0
Since last view 2,560 (14ms between GCs)
        uptime          35.6s
        full                    1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
        incr            1281 totalling 87ms (0.2% uptime), avg 0.1ms
        tenures         0


The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.


Sven











Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

philippeback
In reply to this post by Jan Vrany
An interesting run as it is not ending by a simple OutOfMemory but with a lot of weird elements

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1200 1024000
uptime 0h0m0s
memory 25,004,456 bytes
old 19,213,080 bytes (76.80000000000001%)
young 4,584 bytes (0.0%)
used 19,217,664 bytes (76.9%)
free 5,786,792 bytes (23.1%)
GCs 7 (106ms between GCs)
full 2 totalling 156ms (21.1% uptime), avg 78.0ms
incr 5 totalling 10ms (1.3% uptime), avg 2.0ms
tenures 0

allocating: 1200 times 1024000 bytes
allocated: 1.23 GB
time to run 0:00:01:18.93
3 times GC
time to run 0:00:00:00.833
uptime 0h1m20s
memory -896,717,100 bytes
old -899,440,000 bytes (100.30000000000001%)
young 7,552 bytes (0.0%)
used -899,432,448 bytes (100.30000000000001%)
free 2,715,348 bytes (-0.30000000000000004%)
GCs 2,406 (33ms between GCs)
full 1201 totalling 79,315ms (98.5% uptime), avg 66.0ms
incr 1205 totalling 215ms (0.30000000000000004% uptime), avg 0.2ms
tenures 0
Since last view 2,399 (33ms between GCs)
uptime 79.80000000000001s
full 1199 totalling 79,159ms (99.2% uptime), avg 66.0ms
incr 1200 totalling 205ms (0.30000000000000004% uptime), avg 0.2ms
tenures 0

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1500 1024000
uptime 0h0m0s
memory 25,004,444 bytes
old 19,213,696 bytes (76.80000000000001%)
young 5,384 bytes (0.0%)
used 19,219,080 bytes (76.9%)
free 5,785,364 bytes (23.1%)
GCs 7 (99ms between GCs)
full 2 totalling 129ms (18.7% uptime), avg 64.5ms
incr 5 totalling 9ms (1.3% uptime), avg 1.8ms
tenures 0

allocating: 1500 times 1024000 bytes

===============================================================================
Notice: Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st
===============================================================================
Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st
==== Startup Error: OutOfMemory
ByteArray class(Behavior)>>basicNew:
ByteArray class(Behavior)>>new:
UndefinedObject>>DoIt in Block: [t6...
SmallInteger(Integer)>>timesRepeat:
UndefinedObject>>DoIt in Block: [:t6 | t1...
Array class(SequenceableCollection class)>>new:streamContents:
UndefinedObject>>DoIt in Block: [t4 := Array...
Time class>>millisecondsToRun:
BlockClosure>>timeToRun
UndefinedObject>>DoIt
Compiler>>evaluate:in:to:notifying:ifFail:logged:
Compiler class>>evaluate:for:notifying:logged:
Compiler class>>evaluate:for:logged:
Compiler class>>evaluate:logged:
DoItDeclaration>>import
CodeImporter>>evaluate in Block: [:decl | value := decl import]
OrderedCollection>>do:
CodeImporter>>evaluate
BasicCodeLoader>>installSourceFile: in Block: [codeImporter evaluate]
BlockClosure>>on:do:
BasicCodeLoader>>handleErrorsDuring:reference:
BasicCodeLoader>>installSourceFile:
BasicCodeLoader>>installSourceFiles in Block: [:reference | self installSourceFile: reference]
OrderedCollection>>do:
BasicCodeLoader>>installSourceFiles in Block: [sourceFiles...
BlockClosure>>ensure:
BasicCodeLoader>>installSourceFiles
BasicCodeLoader>>activate
BasicCodeLoader class(CommandLineHandler class)>>activateWith:
DefaultCommandLineHandler>>handleSubcommand

out of memory



Smalltalk stack dump:
0xbffd3f38 M BlockClosure>ifError: 0x7aad665c: a(n) BlockClosure
0xbffd3f5c M [] in Semaphore>critical:ifError: 0x1f62f350: a(n) Semaphore
0xbffd3f7c M [] in Semaphore>critical: 0x1f62f350: a(n) Semaphore
0xbffd3f9c M BlockClosure>ensure: 0x7aad6578: a(n) BlockClosure
0xbffb15c4 M Semaphore>critical: 0x1f62f350: a(n) Semaphore
0xbffb15e4 M Semaphore>critical:ifError: 0x1f62f350: a(n) Semaphore
0xbffb1604 M WeakRegistry>protected: 0x1f62f318: a(n) WeakRegistry
0xbffb162c M WeakRegistry>finalizeValues 0x1f62f318: a(n) WeakRegistry
0xbffb1648 M [] in WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class
0xbffb1664 M BlockClosure>on:do: 0x7aad6310: a(n) BlockClosure
0xbffb1684 M BlockClosure>on:fork: 0x7aad6310: a(n) BlockClosure
0xbffb16a4 M [] in WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class
0xbffb16c8 M WeakArray(SequenceableCollection)>do: 0x1f502290: a(n) WeakArray
0xbffb16e4 M [] in WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class
0xbffb1704 M [] in Semaphore>critical: 0x2074d594: a(n) Semaphore
0xbffb1724 M BlockClosure>ensure: 0x7aad621c: a(n) BlockClosure
0xbffb1744 M Semaphore>critical: 0x2074d594: a(n) Semaphore
0xbffb1760 M WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class
0xbffb1780 I [] in WeakArray class>restartFinalizationProcess 0x1f7fd4e0: a(n) WeakArray class
0xbffb17a0 I [] in BlockClosure>newProcess 0x2074d600: a(n) BlockClosure

Most recent primitives
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
wait
wait
signal
wait
signal
wait
signal
signal
basicNew:
basicNew
class
at:put:
wait
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
wait
wait
signal
wait
signal
wait
signal
signal
basicNew:
basicNew
class
at:put:
wait
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
wait
wait
signal
wait
signal
wait
signal
signal
basicNew:
basicNew
class
at:put:
wait
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
shallowCopy
wait
wait
./pharo: line 11: 41423 Abort trap: 6           "$DIR"/"pharo-vm/Pharo.app/Contents/MacOS/Pharo" --headless "$@"
[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$]

---
Philippe Back
Dramatic Performance Improvements
Mob: +32(0) 478 650 140 | Fax: +32 (0) 70 408 027
Blog: http://philippeback.be | Twitter: @philippeback

High Octane SPRL
rue cour Boisacq 101 | 1301 Bierges | Belgium

Featured on the Software Process and Measurement Cast
Sparx Systems Enterprise Architect and Ability Engineering EADocX Value Added Reseller
 



On Sat, Jun 29, 2013 at 8:22 PM, Jan Vrany <[hidden email]> wrote:
On 29/06/13 17:45, Sven Van Caekenberghe wrote:
Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan


Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:
Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)


Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
   data := Array new: count streamContents: [ :out |
     count timesRepeat: [
       out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
   show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
   cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
   3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.


And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,624 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,219,008 bytes (76.60000000000001%)
        free            5,857,700 bytes (23.400000000000002%)
GCs                             10 (12ms between GCs)
        full                    1 totalling 30ms (25.0% uptime), avg 30.0ms
        incr            9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
        tenures         0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime                  0h0m3s
memory                  546,746,364 bytes
        old                     543,520,476 bytes (99.4%)
        young           7,184 bytes (0.0%)
        used            543,527,660 bytes (99.4%)
        free            3,218,704 bytes (0.6000000000000001%)
GCs                             356 (9ms between GCs)
        full                    89 totalling 2,743ms (86.4% uptime), avg 30.8ms
        incr            267 totalling 223ms (7.0% uptime), avg 0.8ms
        tenures         0
Since last view 346 (9ms between GCs)
        uptime          3.1s
        full                    88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
        incr            258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
        tenures         0


It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,716 bytes
        old                     19,213,480 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,864 bytes (76.60000000000001%)
        free            5,857,852 bytes (23.400000000000002%)
GCs                             12 (13ms between GCs)
        full                    2 totalling 56ms (37.1% uptime), avg 28.0ms
        incr            10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime                  0h0m14s
memory                  533,946,384 bytes
        old                     531,232,232 bytes (99.5%)
        young           7,644 bytes (0.0%)
        used            531,239,876 bytes (99.5%)
        free            2,706,508 bytes (0.5%)
GCs                             1,011 (14ms between GCs)
        full                    501 totalling 13,819ms (98.7% uptime), avg 27.6ms
        incr            510 totalling 35ms (0.2% uptime), avg 0.1ms
        tenures         0
Since last view 999 (14ms between GCs)
        uptime          13.9s
        full                    499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
        incr            500 totalling 29ms (0.2% uptime), avg 0.1ms
        tenures         0


So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,592 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,976 bytes (76.60000000000001%)
        free            5,857,732 bytes (23.400000000000002%)
GCs                             12 (12ms between GCs)
        full                    2 totalling 55ms (37.9% uptime), avg 27.5ms
        incr            10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime                  0h0m35s
memory                  -783,340,292 bytes
        old                     -786,061,544 bytes (100.30000000000001%)
        young           7,460 bytes (0.0%)
        used            -786,054,084 bytes (100.30000000000001%)
        free            2,713,792 bytes (-0.30000000000000004%)
GCs                             2,572 (14ms between GCs)
        full                    1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
        incr            1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
        tenures         0
Since last view 2,560 (14ms between GCs)
        uptime          35.6s
        full                    1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
        incr            1281 totalling 87ms (0.2% uptime), avg 0.1ms
        tenures         0


The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.


Sven











Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Exploring Heap Size Limits

philippeback
In reply to this post by Jan Vrany
Other than that it is possible to go quite far:

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1400 1024000
uptime 0h0m0s
memory 25,004,460 bytes
old 19,212,804 bytes (76.80000000000001%)
young 4,584 bytes (0.0%)
used 19,217,388 bytes (76.9%)
free 5,787,072 bytes (23.1%)
GCs 7 (100ms between GCs)
full 2 totalling 132ms (18.8% uptime), avg 66.0ms
incr 5 totalling 9ms (1.3% uptime), avg 1.8ms
tenures 0

allocating: 1400 times 1024000 bytes
allocated: 1.43 GB
time to run 0:00:01:21.708
3 times GC
time to run 0:00:00:00.944
uptime 0h1m23s
memory -691,912,800 bytes
old -694,637,076 bytes (100.4%)
young 7,184 bytes (0.0%)
used -694,629,892 bytes (100.4%)
free 2,717,092 bytes (-0.4%)
GCs 2,806 (30ms between GCs)
full 1401 totalling 82,194ms (98.60000000000001% uptime), avg 58.7ms
incr 1405 totalling 209ms (0.30000000000000004% uptime), avg 0.1ms
tenures 0
Since last view 2,799 (30ms between GCs)
uptime 82.7s
full 1399 totalling 82,062ms (99.30000000000001% uptime), avg 58.7ms
incr 1400 totalling 200ms (0.2% uptime), avg 0.1ms
tenures 0


On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan


Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:
Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)


Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
   data := Array new: count streamContents: [ :out |
     count timesRepeat: [
       out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
   show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
   cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
   3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.


And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,624 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,219,008 bytes (76.60000000000001%)
        free            5,857,700 bytes (23.400000000000002%)
GCs                             10 (12ms between GCs)
        full                    1 totalling 30ms (25.0% uptime), avg 30.0ms
        incr            9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
        tenures         0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime                  0h0m3s
memory                  546,746,364 bytes
        old                     543,520,476 bytes (99.4%)
        young           7,184 bytes (0.0%)
        used            543,527,660 bytes (99.4%)
        free            3,218,704 bytes (0.6000000000000001%)
GCs                             356 (9ms between GCs)
        full                    89 totalling 2,743ms (86.4% uptime), avg 30.8ms
        incr            267 totalling 223ms (7.0% uptime), avg 0.8ms
        tenures         0
Since last view 346 (9ms between GCs)
        uptime          3.1s
        full                    88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
        incr            258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
        tenures         0


It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime                  0h0m0s
memory                  25,076,716 bytes
        old                     19,213,480 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,864 bytes (76.60000000000001%)
        free            5,857,852 bytes (23.400000000000002%)
GCs                             12 (13ms between GCs)
        full                    2 totalling 56ms (37.1% uptime), avg 28.0ms
        incr            10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime                  0h0m14s
memory                  533,946,384 bytes
        old                     531,232,232 bytes (99.5%)
        young           7,644 bytes (0.0%)
        used            531,239,876 bytes (99.5%)
        free            2,706,508 bytes (0.5%)
GCs                             1,011 (14ms between GCs)
        full                    501 totalling 13,819ms (98.7% uptime), avg 27.6ms
        incr            510 totalling 35ms (0.2% uptime), avg 0.1ms
        tenures         0
Since last view 999 (14ms between GCs)
        uptime          13.9s
        full                    499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
        incr            500 totalling 29ms (0.2% uptime), avg 0.1ms
        tenures         0


So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime                  0h0m0s
memory                  25,076,708 bytes
        old                     19,213,592 bytes (76.60000000000001%)
        young           5,384 bytes (0.0%)
        used            19,218,976 bytes (76.60000000000001%)
        free            5,857,732 bytes (23.400000000000002%)
GCs                             12 (12ms between GCs)
        full                    2 totalling 55ms (37.9% uptime), avg 27.5ms
        incr            10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
        tenures         0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime                  0h0m35s
memory                  -783,340,292 bytes
        old                     -786,061,544 bytes (100.30000000000001%)
        young           7,460 bytes (0.0%)
        used            -786,054,084 bytes (100.30000000000001%)
        free            2,713,792 bytes (-0.30000000000000004%)
GCs                             2,572 (14ms between GCs)
        full                    1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
        incr            1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
        tenures         0
Since last view 2,560 (14ms between GCs)
        uptime          35.6s
        full                    1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
        incr            1281 totalling 87ms (0.2% uptime), avg 0.1ms
        tenures         0


The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.


Sven