Smalltalk › Pharo › Pharo Smalltalk Developers

[Pharo-dev] Exploring Heap Size Limits

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

11 messages Options

Sven Van Caekenberghe-2

[Pharo-dev] Exploring Heap Size Limits

Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)

Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
data := Array new: count streamContents: [ :out |
count timesRepeat: [
out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.

And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,624 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,219,008 bytes (76.60000000000001%)
free 5,857,700 bytes (23.400000000000002%)
GCs 10 (12ms between GCs)
full 1 totalling 30ms (25.0% uptime), avg 30.0ms
incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
tenures 0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime 0h0m3s
memory 546,746,364 bytes
old 543,520,476 bytes (99.4%)
young 7,184 bytes (0.0%)
used 543,527,660 bytes (99.4%)
free 3,218,704 bytes (0.6000000000000001%)
GCs 356 (9ms between GCs)
full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
tenures 0
Since last view 346 (9ms between GCs)
uptime 3.1s
full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
tenures 0

It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime 0h0m0s
memory 25,076,716 bytes
old 19,213,480 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,864 bytes (76.60000000000001%)
free 5,857,852 bytes (23.400000000000002%)
GCs 12 (13ms between GCs)
full 2 totalling 56ms (37.1% uptime), avg 28.0ms
incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime 0h0m14s
memory 533,946,384 bytes
old 531,232,232 bytes (99.5%)
young 7,644 bytes (0.0%)
used 531,239,876 bytes (99.5%)
free 2,706,508 bytes (0.5%)
GCs 1,011 (14ms between GCs)
full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
tenures 0
Since last view 999 (14ms between GCs)
uptime 13.9s
full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
tenures 0

So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,592 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,976 bytes (76.60000000000001%)
free 5,857,732 bytes (23.400000000000002%)
GCs 12 (12ms between GCs)
full 2 totalling 55ms (37.9% uptime), avg 27.5ms
incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime 0h0m35s
memory -783,340,292 bytes
old -786,061,544 bytes (100.30000000000001%)
young 7,460 bytes (0.0%)
used -786,054,084 bytes (100.30000000000001%)
free 2,713,792 bytes (-0.30000000000000004%)
GCs 2,572 (14ms between GCs)
full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
tenures 0
Since last view 2,560 (14ms between GCs)
uptime 35.6s
full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
tenures 0

The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.

Sven

philippeback

Re: [Pharo-dev] Exploring Heap Size Limits

On OSX Lion w/ 8GB RAM, the first call gives me:

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./run.sh

memtest.st

uptime 0h0m0s

memory 25,004,440 bytes

old 19,213,728 bytes (76.80000000000001%)

young 5,384 bytes (0.0%)

used 19,219,112 bytes (76.9%)

free 5,785,328 bytes (23.1%)

GCs 5 (127ms between GCs)

full 1 totalling 72ms (11.4% uptime), avg 72.0ms

incr 4 totalling 11ms (1.7000000000000002% uptime), avg 2.8000000000000003ms

tenures 0

allocating: 512 times 1024000 bytes

===============================================================================

Notice: Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st

===============================================================================

Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st

==== Startup Error: OutOfMemory

ByteArray class(Behavior)>>basicNew:

ByteArray class(Behavior)>>new:

UndefinedObject>>DoIt in Block: [t6...

SmallInteger(Integer)>>timesRepeat:

UndefinedObject>>DoIt in Block: [:t6 | t1...

Array class(SequenceableCollection class)>>new:streamContents:

UndefinedObject>>DoIt in Block: [t4 := Array...

Time class>>millisecondsToRun:

BlockClosure>>timeToRun

UndefinedObject>>DoIt

Compiler>>evaluate:in:to:notifying:ifFail:logged:

Compiler class>>evaluate:for:notifying:logged:

Compiler class>>evaluate:for:logged:

Compiler class>>evaluate:logged:

DoItDeclaration>>import

CodeImporter>>evaluate in Block: [:decl | value := decl import]

OrderedCollection>>do:

CodeImporter>>evaluate

BasicCodeLoader>>installSourceFile: in Block: [codeImporter evaluate]

BlockClosure>>on:do:

BasicCodeLoader>>handleErrorsDuring:reference:

BasicCodeLoader>>installSourceFile:

BasicCodeLoader>>installSourceFiles in Block: [:reference | self installSourceFile: reference]

OrderedCollection>>do:

BasicCodeLoader>>installSourceFiles in Block: [sourceFiles...

BlockClosure>>ensure:

BasicCodeLoader>>installSourceFiles

BasicCodeLoader>>activate

BasicCodeLoader class(CommandLineHandler class)>>activateWith:

DefaultCommandLineHandler>>handleSubcommand

Got startup errors:

OutOfMemory

Now, I did a purge.

Same results.

Need to pass a parameter or something?

This is on a fresh 2.0 + vmLatest

Phil

philippeback

Re: [Pharo-dev] Exploring Heap Size Limits

In reply to this post by Sven Van Caekenberghe-2

I did some more experiments and that's the most I can get from the script, no matter how much free memory there is on the machine.

Adding ./pharo --memory 2000m ... has no effect.

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 501 1024000

memtest.st

uptime 0h0m0s

memory 25,004,444 bytes

old 19,213,688 bytes (76.80000000000001%)

young 5,384 bytes (0.0%)

used 19,219,072 bytes (76.9%)

free 5,785,372 bytes (23.1%)

GCs 5 (62ms between GCs)

full 1 totalling 76ms (24.5% uptime), avg 76.0ms

incr 4 totalling 10ms (3.2% uptime), avg 2.5ms

tenures 0

allocating: 501 times 1024000 bytes

allocated: 513.02 MB

time to run 0:00:00:06.27

3 times GC

time to run 0:00:00:00.494

uptime 0h0m7s

memory 534,974,360 bytes

old 532,256,456 bytes (99.5%)

young 7,092 bytes (0.0%)

used 532,263,548 bytes (99.5%)

free 2,710,812 bytes (0.5%)

GCs 344 (21ms between GCs)

full 87 totalling 5,822ms (82.2% uptime), avg 66.9ms

incr 257 totalling 659ms (9.3% uptime), avg 2.6ms

tenures 0

Since last view 339 (20ms between GCs)

uptime 6.800000000000001s

full 86 totalling 5,746ms (84.9% uptime), avg 66.8ms

incr 253 totalling 649ms (9.600000000000001% uptime), avg 2.6ms

tenures 0

Sven Van Caekenberghe-2

Re: [Pharo-dev] Exploring Heap Size Limits

On 29 Jun 2013, at 13:18, [hidden email] wrote:

> I did some more experiments and that's the most I can get from the script, no matter how much free memory there is on the machine.
> Adding ./pharo --memory 2000m ... has no effect.

Yes it seems that it is not possible to allocate more than half a GB on Mac OS X. The VM's are obviously different on each platform.

BTW, I didn't know that get.pharo.org works so well on OS X too ;-)

Now, for server production usage only Linux counts anyway.

On Mac OS X, stability is way more important to me than heap size.

But it would be nice to hear something from our VM experts.

Sven

Jan Vrany

Re: [Pharo-dev] Exploring Heap Size Limits

In reply to this post by Sven Van Caekenberghe-2

Hi,

for 32bit application you can never allocate whole 4GB as some memory
has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock
kernels come with 2/2 split.

Jan

On 28/06/13 23:02, Sven Van Caekenberghe wrote:

> Hi,
>
> I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.
>
> Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.
>
> Anyway, my goal is to start the debate around this topic ;-)
>
>
> Here is the test script:
>
> t3@coolermaster:smalltalk$ cat memtest.st
>
> NonInteractiveTranscript stdout install.
>
> !
>
> Transcript show: 'memtest.st'; cr.
> Smalltalk garbageCollect.
> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>
> !
>
> | count size data timeToRun |
> count := Smalltalk commandLine arguments first asInteger.
> size := Smalltalk commandLine arguments second asInteger.
> Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
> timeToRun := [
> data := Array new: count streamContents: [ :out |
> count timesRepeat: [
> out nextPut: (ByteArray new: size) ] ].
> ] timeToRun.
> Transcript
> show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
> cr.
> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
> Transcript show: '3 times GC'; cr.
> timeToRun := [
> 3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>
> !
>
> SmalltalkImage current quitPrimitive.
>
>
> And here is a normal result running get.pharo.org/20+vm
>
> $ ./pharo Pharo.image memtest.st 512 1024000
>
> memtest.st
> uptime 0h0m0s
> memory 25,076,708 bytes
> old 19,213,624 bytes (76.60000000000001%)
> young 5,384 bytes (0.0%)
> used 19,219,008 bytes (76.60000000000001%)
> free 5,857,700 bytes (23.400000000000002%)
> GCs 10 (12ms between GCs)
> full 1 totalling 30ms (25.0% uptime), avg 30.0ms
> incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
> tenures 0
>
> allocating: 512 times 1024000 bytes
> allocated: 524.29 MB
> time to run 0:00:00:02.852
> 3 times GC
> time to run 0:00:00:00.196
> uptime 0h0m3s
> memory 546,746,364 bytes
> old 543,520,476 bytes (99.4%)
> young 7,184 bytes (0.0%)
> used 543,527,660 bytes (99.4%)
> free 3,218,704 bytes (0.6000000000000001%)
> GCs 356 (9ms between GCs)
> full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
> incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
> tenures 0
> Since last view 346 (9ms between GCs)
> uptime 3.1s
> full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
> incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
> tenures 0
>
>
> It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.
>
> You can play a bit with the 2 parameters to generate different datasets.
>
> Not changing any VM parameters, you can successfully do something like
>
> $ ./pharo Pharo.image memtest.st 1000 1024000
>
> or
>
> $ ./pharo Pharo.image memtest.st 1024000 1000
>
> thus allocating about 1GB.
>
> Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):
>
> $ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000
>
> memtest.st
> uptime 0h0m0s
> memory 25,076,716 bytes
> old 19,213,480 bytes (76.60000000000001%)
> young 5,384 bytes (0.0%)
> used 19,218,864 bytes (76.60000000000001%)
> free 5,857,852 bytes (23.400000000000002%)
> GCs 12 (13ms between GCs)
> full 2 totalling 56ms (37.1% uptime), avg 28.0ms
> incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
> tenures 0
>
> allocating: 500 times 1024000 bytes
> allocated: 512.00 MB
> time to run 0:00:00:13.66
> 3 times GC
> time to run 0:00:00:00.192
> uptime 0h0m14s
> memory 533,946,384 bytes
> old 531,232,232 bytes (99.5%)
> young 7,644 bytes (0.0%)
> used 531,239,876 bytes (99.5%)
> free 2,706,508 bytes (0.5%)
> GCs 1,011 (14ms between GCs)
> full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
> incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
> tenures 0
> Since last view 999 (14ms between GCs)
> uptime 13.9s
> full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
> incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
> tenures 0
>
>
> So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.
>
> Increasing the heap size to 1792 Mb gives the following result:
>
> t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576
>
> memtest.st
> uptime 0h0m0s
> memory 25,076,708 bytes
> old 19,213,592 bytes (76.60000000000001%)
> young 5,384 bytes (0.0%)
> used 19,218,976 bytes (76.60000000000001%)
> free 5,857,732 bytes (23.400000000000002%)
> GCs 12 (12ms between GCs)
> full 2 totalling 55ms (37.9% uptime), avg 27.5ms
> incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
> tenures 0
>
> allocating: 1280 times 1048576 bytes
> allocated: 1.34 GB
> time to run 0:00:00:35.258
> 3 times GC
> time to run 0:00:00:00.37
> uptime 0h0m35s
> memory -783,340,292 bytes
> old -786,061,544 bytes (100.30000000000001%)
> young 7,460 bytes (0.0%)
> used -786,054,084 bytes (100.30000000000001%)
> free 2,713,792 bytes (-0.30000000000000004%)
> GCs 2,572 (14ms between GCs)
> full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
> incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
> tenures 0
> Since last view 2,560 (14ms between GCs)
> uptime 35.6s
> full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
> incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
> tenures 0
>
>
> The allocation takes a lot longer, but the GC times are stable.
>
> I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.
>
> Again, I am not complaining, just exploring/wondering.
>
>
> Sven
>
>
>

Sven Van Caekenberghe-2

Re: [Pharo-dev] Exploring Heap Size Limits

Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

> Hi,
>
> for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
> the actual code must be somewhere in the memory as well as the stack.
> Usually the real limit for 32bit application is somewhere like 1.8GB for
> heap. With 3/1 split you can have one more gig, but for linux that would
> require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.
>
> Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

Sven

> On 28/06/13 23:02, Sven Van Caekenberghe wrote:
>> Hi,
>>
>> I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.
>>
>> Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.
>>
>> Anyway, my goal is to start the debate around this topic ;-)
>>
>>
>> Here is the test script:
>>
>> t3@coolermaster:smalltalk$ cat memtest.st
>>
>> NonInteractiveTranscript stdout install.
>>
>> !
>>
>> Transcript show: 'memtest.st'; cr.
>> Smalltalk garbageCollect.
>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>
>> !
>>
>> | count size data timeToRun |
>> count := Smalltalk commandLine arguments first asInteger.
>> size := Smalltalk commandLine arguments second asInteger.
>> Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
>> timeToRun := [
>> data := Array new: count streamContents: [ :out |
>> count timesRepeat: [
>> out nextPut: (ByteArray new: size) ] ].
>> ] timeToRun.
>> Transcript
>> show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
>> cr.
>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>> Transcript show: '3 times GC'; cr.
>> timeToRun := [
>> 3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>
>> !
>>
>> SmalltalkImage current quitPrimitive.
>>
>>
>> And here is a normal result running get.pharo.org/20+vm
>>
>> $ ./pharo Pharo.image memtest.st 512 1024000
>>
>> memtest.st
>> uptime 0h0m0s
>> memory 25,076,708 bytes
>> old 19,213,624 bytes (76.60000000000001%)
>> young 5,384 bytes (0.0%)
>> used 19,219,008 bytes (76.60000000000001%)
>> free 5,857,700 bytes (23.400000000000002%)
>> GCs 10 (12ms between GCs)
>> full 1 totalling 30ms (25.0% uptime), avg 30.0ms
>> incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
>> tenures 0
>>
>> allocating: 512 times 1024000 bytes
>> allocated: 524.29 MB
>> time to run 0:00:00:02.852
>> 3 times GC
>> time to run 0:00:00:00.196
>> uptime 0h0m3s
>> memory 546,746,364 bytes
>> old 543,520,476 bytes (99.4%)
>> young 7,184 bytes (0.0%)
>> used 543,527,660 bytes (99.4%)
>> free 3,218,704 bytes (0.6000000000000001%)
>> GCs 356 (9ms between GCs)
>> full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
>> incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
>> tenures 0
>> Since last view 346 (9ms between GCs)
>> uptime 3.1s
>> full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
>> incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
>> tenures 0
>>
>>
>> It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.
>>
>> You can play a bit with the 2 parameters to generate different datasets.
>>
>> Not changing any VM parameters, you can successfully do something like
>>
>> $ ./pharo Pharo.image memtest.st 1000 1024000
>>
>> or
>>
>> $ ./pharo Pharo.image memtest.st 1024000 1000
>>
>> thus allocating about 1GB.
>>
>> Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):
>>
>> $ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000
>>
>> memtest.st
>> uptime 0h0m0s
>> memory 25,076,716 bytes
>> old 19,213,480 bytes (76.60000000000001%)
>> young 5,384 bytes (0.0%)
>> used 19,218,864 bytes (76.60000000000001%)
>> free 5,857,852 bytes (23.400000000000002%)
>> GCs 12 (13ms between GCs)
>> full 2 totalling 56ms (37.1% uptime), avg 28.0ms
>> incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
>> tenures 0
>>
>> allocating: 500 times 1024000 bytes
>> allocated: 512.00 MB
>> time to run 0:00:00:13.66
>> 3 times GC
>> time to run 0:00:00:00.192
>> uptime 0h0m14s
>> memory 533,946,384 bytes
>> old 531,232,232 bytes (99.5%)
>> young 7,644 bytes (0.0%)
>> used 531,239,876 bytes (99.5%)
>> free 2,706,508 bytes (0.5%)
>> GCs 1,011 (14ms between GCs)
>> full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
>> incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
>> tenures 0
>> Since last view 999 (14ms between GCs)
>> uptime 13.9s
>> full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
>> incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
>> tenures 0
>>
>>
>> So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.
>>
>> Increasing the heap size to 1792 Mb gives the following result:
>>
>> t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576
>>
>> memtest.st
>> uptime 0h0m0s
>> memory 25,076,708 bytes
>> old 19,213,592 bytes (76.60000000000001%)
>> young 5,384 bytes (0.0%)
>> used 19,218,976 bytes (76.60000000000001%)
>> free 5,857,732 bytes (23.400000000000002%)
>> GCs 12 (12ms between GCs)
>> full 2 totalling 55ms (37.9% uptime), avg 27.5ms
>> incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
>> tenures 0
>>
>> allocating: 1280 times 1048576 bytes
>> allocated: 1.34 GB
>> time to run 0:00:00:35.258
>> 3 times GC
>> time to run 0:00:00:00.37
>> uptime 0h0m35s
>> memory -783,340,292 bytes
>> old -786,061,544 bytes (100.30000000000001%)
>> young 7,460 bytes (0.0%)
>> used -786,054,084 bytes (100.30000000000001%)
>> free 2,713,792 bytes (-0.30000000000000004%)
>> GCs 2,572 (14ms between GCs)
>> full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
>> incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
>> tenures 0
>> Since last view 2,560 (14ms between GCs)
>> uptime 35.6s
>> full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
>> incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
>> tenures 0
>>
>>
>> The allocation takes a lot longer, but the GC times are stable.
>>
>> I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.
>>
>> Again, I am not complaining, just exploring/wondering.
>>
>>
>> Sven
>>
>>
>>
>
>

Jan Vrany

Re: [Pharo-dev] Exploring Heap Size Limits

On 29/06/13 17:45, Sven Van Caekenberghe wrote:

> Hi Jan,
>
> On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:
>
>> Hi,
>>
>> for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
>> the actual code must be somewhere in the memory as well as the stack.
>> Usually the real limit for 32bit application is somewhere like 1.8GB for
>> heap. With 3/1 split you can have one more gig, but for linux that would
>> require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.
>>
>> Jan
>
> Thanks for the reply. Sounds pretty reasonable.
>
> Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which
need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory
above 0x7FFFFFFF) Then the kernel install its own, kernel page table
which maps some memory the very same physical page which was mapped in
user process's page table to the address where the data were just
copied. That's how the data are transferred between user space and
kernel space. Or, at least that's my understanding how it's done :-) I
was never particularly good at these low-level things.

Best, Jan

>
> Sven
>
>> On 28/06/13 23:02, Sven Van Caekenberghe wrote:
>>> Hi,
>>>
>>> I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.
>>>
>>> Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.
>>>
>>> Anyway, my goal is to start the debate around this topic ;-)
>>>
>>>
>>> Here is the test script:
>>>
>>> t3@coolermaster:smalltalk$ cat memtest.st
>>>
>>> NonInteractiveTranscript stdout install.
>>>
>>> !
>>>
>>> Transcript show: 'memtest.st'; cr.
>>> Smalltalk garbageCollect.
>>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>>
>>> !
>>>
>>> | count size data timeToRun |
>>> count := Smalltalk commandLine arguments first asInteger.
>>> size := Smalltalk commandLine arguments second asInteger.
>>> Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
>>> timeToRun := [
>>> data := Array new: count streamContents: [ :out |
>>> count timesRepeat: [
>>> out nextPut: (ByteArray new: size) ] ].
>>> ] timeToRun.
>>> Transcript
>>> show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
>>> cr.
>>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>>> Transcript show: '3 times GC'; cr.
>>> timeToRun := [
>>> 3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
>>> Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
>>> Transcript show: SmalltalkImage current vm statisticsReport; cr.
>>>
>>> !
>>>
>>> SmalltalkImage current quitPrimitive.
>>>
>>>
>>> And here is a normal result running get.pharo.org/20+vm
>>>
>>> $ ./pharo Pharo.image memtest.st 512 1024000
>>>
>>> memtest.st
>>> uptime 0h0m0s
>>> memory 25,076,708 bytes
>>> old 19,213,624 bytes (76.60000000000001%)
>>> young 5,384 bytes (0.0%)
>>> used 19,219,008 bytes (76.60000000000001%)
>>> free 5,857,700 bytes (23.400000000000002%)
>>> GCs 10 (12ms between GCs)
>>> full 1 totalling 30ms (25.0% uptime), avg 30.0ms
>>> incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
>>> tenures 0
>>>
>>> allocating: 512 times 1024000 bytes
>>> allocated: 524.29 MB
>>> time to run 0:00:00:02.852
>>> 3 times GC
>>> time to run 0:00:00:00.196
>>> uptime 0h0m3s
>>> memory 546,746,364 bytes
>>> old 543,520,476 bytes (99.4%)
>>> young 7,184 bytes (0.0%)
>>> used 543,527,660 bytes (99.4%)
>>> free 3,218,704 bytes (0.6000000000000001%)
>>> GCs 356 (9ms between GCs)
>>> full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
>>> incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
>>> tenures 0
>>> Since last view 346 (9ms between GCs)
>>> uptime 3.1s
>>> full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
>>> incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
>>> tenures 0
>>>
>>>
>>> It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.
>>>
>>> You can play a bit with the 2 parameters to generate different datasets.
>>>
>>> Not changing any VM parameters, you can successfully do something like
>>>
>>> $ ./pharo Pharo.image memtest.st 1000 1024000
>>>
>>> or
>>>
>>> $ ./pharo Pharo.image memtest.st 1024000 1000
>>>
>>> thus allocating about 1GB.
>>>
>>> Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):
>>>
>>> $ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000
>>>
>>> memtest.st
>>> uptime 0h0m0s
>>> memory 25,076,716 bytes
>>> old 19,213,480 bytes (76.60000000000001%)
>>> young 5,384 bytes (0.0%)
>>> used 19,218,864 bytes (76.60000000000001%)
>>> free 5,857,852 bytes (23.400000000000002%)
>>> GCs 12 (13ms between GCs)
>>> full 2 totalling 56ms (37.1% uptime), avg 28.0ms
>>> incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
>>> tenures 0
>>>
>>> allocating: 500 times 1024000 bytes
>>> allocated: 512.00 MB
>>> time to run 0:00:00:13.66
>>> 3 times GC
>>> time to run 0:00:00:00.192
>>> uptime 0h0m14s
>>> memory 533,946,384 bytes
>>> old 531,232,232 bytes (99.5%)
>>> young 7,644 bytes (0.0%)
>>> used 531,239,876 bytes (99.5%)
>>> free 2,706,508 bytes (0.5%)
>>> GCs 1,011 (14ms between GCs)
>>> full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
>>> incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
>>> tenures 0
>>> Since last view 999 (14ms between GCs)
>>> uptime 13.9s
>>> full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
>>> incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
>>> tenures 0
>>>
>>>
>>> So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.
>>>
>>> Increasing the heap size to 1792 Mb gives the following result:
>>>
>>> t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576
>>>
>>> memtest.st
>>> uptime 0h0m0s
>>> memory 25,076,708 bytes
>>> old 19,213,592 bytes (76.60000000000001%)
>>> young 5,384 bytes (0.0%)
>>> used 19,218,976 bytes (76.60000000000001%)
>>> free 5,857,732 bytes (23.400000000000002%)
>>> GCs 12 (12ms between GCs)
>>> full 2 totalling 55ms (37.9% uptime), avg 27.5ms
>>> incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
>>> tenures 0
>>>
>>> allocating: 1280 times 1048576 bytes
>>> allocated: 1.34 GB
>>> time to run 0:00:00:35.258
>>> 3 times GC
>>> time to run 0:00:00:00.37
>>> uptime 0h0m35s
>>> memory -783,340,292 bytes
>>> old -786,061,544 bytes (100.30000000000001%)
>>> young 7,460 bytes (0.0%)
>>> used -786,054,084 bytes (100.30000000000001%)
>>> free 2,713,792 bytes (-0.30000000000000004%)
>>> GCs 2,572 (14ms between GCs)
>>> full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
>>> incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
>>> tenures 0
>>> Since last view 2,560 (14ms between GCs)
>>> uptime 35.6s
>>> full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
>>> incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
>>> tenures 0
>>>
>>>
>>> The allocation takes a lot longer, but the GC times are stable.
>>>
>>> I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.
>>>
>>> Again, I am not complaining, just exploring/wondering.
>>>
>>>
>>> Sven
>>>
>>>
>>>
>>
>>
>
>
>

philippeback

Re: [Pharo-dev] Exploring Heap Size Limits

I've been changing the value for the heap in the info.plist on OSX and it is indeed possible to go higher.

The default was 536870912 (512MB or so)

I changed it to 1536870912 and it indeed allows to go higher.

Here is a sample:

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1000 1024000

memtest.st

uptime 0h0m0s

memory 25,004,444 bytes

old 19,213,696 bytes (76.80000000000001%)

young 5,384 bytes (0.0%)

used 19,219,080 bytes (76.9%)

free 5,785,364 bytes (23.1%)

GCs 7 (101ms between GCs)

full 2 totalling 124ms (17.6% uptime), avg 62.0ms

incr 5 totalling 8ms (1.1% uptime), avg 1.6ms

tenures 0

allocating: 1000 times 1024000 bytes

allocated: 1.02 GB

time to run 0:00:00:59.776

3 times GC

time to run 0:00:00:00.73

uptime 0h1m1s

memory 1,045,961,948 bytes

old 1,043,240,448 bytes (99.7%)

young 7,552 bytes (0.0%)

used 1,043,248,000 bytes (99.7%)

free 2,713,948 bytes (0.30000000000000004%)

GCs 2,006 (31ms between GCs)

full 1001 totalling 60,190ms (98.30000000000001% uptime), avg 60.1ms

incr 1005 totalling 143ms (0.2% uptime), avg 0.1ms

tenures 0

Since last view 1,999 (30ms between GCs)

uptime 60.5s

full 999 totalling 60,066ms (99.30000000000001% uptime), avg 60.1ms

incr 1000 totalling 135ms (0.2% uptime), avg 0.1ms

tenures 0

Phil

On Sat, Jun 29, 2013 at 8:22 PM, Jan Vrany <[hidden email]> wrote:

On 29/06/13 17:45, Sven Van Caekenberghe wrote:

Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan

Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:

Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)

Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
data := Array new: count streamContents: [ :out |
count timesRepeat: [
out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.

And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,624 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,219,008 bytes (76.60000000000001%)
free 5,857,700 bytes (23.400000000000002%)
GCs 10 (12ms between GCs)
full 1 totalling 30ms (25.0% uptime), avg 30.0ms
incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
tenures 0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime 0h0m3s
memory 546,746,364 bytes
old 543,520,476 bytes (99.4%)
young 7,184 bytes (0.0%)
used 543,527,660 bytes (99.4%)
free 3,218,704 bytes (0.6000000000000001%)
GCs 356 (9ms between GCs)
full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
tenures 0
Since last view 346 (9ms between GCs)
uptime 3.1s
full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
tenures 0

It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime 0h0m0s
memory 25,076,716 bytes
old 19,213,480 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,864 bytes (76.60000000000001%)
free 5,857,852 bytes (23.400000000000002%)
GCs 12 (13ms between GCs)
full 2 totalling 56ms (37.1% uptime), avg 28.0ms
incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime 0h0m14s
memory 533,946,384 bytes
old 531,232,232 bytes (99.5%)
young 7,644 bytes (0.0%)
used 531,239,876 bytes (99.5%)
free 2,706,508 bytes (0.5%)
GCs 1,011 (14ms between GCs)
full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
tenures 0
Since last view 999 (14ms between GCs)
uptime 13.9s
full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
tenures 0

So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,592 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,976 bytes (76.60000000000001%)
free 5,857,732 bytes (23.400000000000002%)
GCs 12 (12ms between GCs)
full 2 totalling 55ms (37.9% uptime), avg 27.5ms
incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime 0h0m35s
memory -783,340,292 bytes
old -786,061,544 bytes (100.30000000000001%)
young 7,460 bytes (0.0%)
used -786,054,084 bytes (100.30000000000001%)
free 2,713,792 bytes (-0.30000000000000004%)
GCs 2,572 (14ms between GCs)
full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
tenures 0
Since last view 2,560 (14ms between GCs)
uptime 35.6s
full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
tenures 0

The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.

Sven

philippeback

Re: [Pharo-dev] Exploring Heap Size Limits

In reply to this post by Jan Vrany

And at one point, even if there was free memory, things went south.

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 500 1024000

errno 12

mmap failed: Cannot allocate memory

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1 1024000

errno 12

mmap failed: Cannot allocate memory

Phil

---

Philippe Back

Dramatic Performance Improvements

Mob: +32(0) 478 650 140 | Fax: +32 (0) 70 408 027

[hidden email] | Web: http://philippeback.eu

Blog: http://philippeback.be | Twitter: @philippeback

Youtube: http://www.youtube.com/user/philippeback/videos

High Octane SPRL

rue cour Boisacq 101 | 1301 Bierges | Belgium

Featured on the Software Process and Measurement Cast

http://spamcast.libsyn.com

Sparx Systems Enterprise Architect and Ability Engineering EADocX Value Added Reseller

On Sat, Jun 29, 2013 at 8:22 PM, Jan Vrany <[hidden email]> wrote:

On 29/06/13 17:45, Sven Van Caekenberghe wrote:

Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan

Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:

Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)

Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
data := Array new: count streamContents: [ :out |
count timesRepeat: [
out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.

And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,624 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,219,008 bytes (76.60000000000001%)
free 5,857,700 bytes (23.400000000000002%)
GCs 10 (12ms between GCs)
full 1 totalling 30ms (25.0% uptime), avg 30.0ms
incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
tenures 0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime 0h0m3s
memory 546,746,364 bytes
old 543,520,476 bytes (99.4%)
young 7,184 bytes (0.0%)
used 543,527,660 bytes (99.4%)
free 3,218,704 bytes (0.6000000000000001%)
GCs 356 (9ms between GCs)
full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
tenures 0
Since last view 346 (9ms between GCs)
uptime 3.1s
full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
tenures 0

It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime 0h0m0s
memory 25,076,716 bytes
old 19,213,480 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,864 bytes (76.60000000000001%)
free 5,857,852 bytes (23.400000000000002%)
GCs 12 (13ms between GCs)
full 2 totalling 56ms (37.1% uptime), avg 28.0ms
incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime 0h0m14s
memory 533,946,384 bytes
old 531,232,232 bytes (99.5%)
young 7,644 bytes (0.0%)
used 531,239,876 bytes (99.5%)
free 2,706,508 bytes (0.5%)
GCs 1,011 (14ms between GCs)
full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
tenures 0
Since last view 999 (14ms between GCs)
uptime 13.9s
full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
tenures 0

So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,592 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,976 bytes (76.60000000000001%)
free 5,857,732 bytes (23.400000000000002%)
GCs 12 (12ms between GCs)
full 2 totalling 55ms (37.9% uptime), avg 27.5ms
incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime 0h0m35s
memory -783,340,292 bytes
old -786,061,544 bytes (100.30000000000001%)
young 7,460 bytes (0.0%)
used -786,054,084 bytes (100.30000000000001%)
free 2,713,792 bytes (-0.30000000000000004%)
GCs 2,572 (14ms between GCs)
full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
tenures 0
Since last view 2,560 (14ms between GCs)
uptime 35.6s
full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
tenures 0

The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.

Sven

philippeback

Re: [Pharo-dev] Exploring Heap Size Limits

In reply to this post by Jan Vrany

An interesting run as it is not ending by a simple OutOfMemory but with a lot of weird elements

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1200 1024000

memtest.st

uptime 0h0m0s

memory 25,004,456 bytes

old 19,213,080 bytes (76.80000000000001%)

young 4,584 bytes (0.0%)

used 19,217,664 bytes (76.9%)

free 5,786,792 bytes (23.1%)

GCs 7 (106ms between GCs)

full 2 totalling 156ms (21.1% uptime), avg 78.0ms

incr 5 totalling 10ms (1.3% uptime), avg 2.0ms

tenures 0

allocating: 1200 times 1024000 bytes

allocated: 1.23 GB

time to run 0:00:01:18.93

3 times GC

time to run 0:00:00:00.833

uptime 0h1m20s

memory -896,717,100 bytes

old -899,440,000 bytes (100.30000000000001%)

young 7,552 bytes (0.0%)

used -899,432,448 bytes (100.30000000000001%)

free 2,715,348 bytes (-0.30000000000000004%)

GCs 2,406 (33ms between GCs)

full 1201 totalling 79,315ms (98.5% uptime), avg 66.0ms

incr 1205 totalling 215ms (0.30000000000000004% uptime), avg 0.2ms

tenures 0

Since last view 2,399 (33ms between GCs)

uptime 79.80000000000001s

full 1199 totalling 79,159ms (99.2% uptime), avg 66.0ms

incr 1200 totalling 205ms (0.30000000000000004% uptime), avg 0.2ms

tenures 0

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1500 1024000

memtest.st

uptime 0h0m0s

memory 25,004,444 bytes

old 19,213,696 bytes (76.80000000000001%)

young 5,384 bytes (0.0%)

used 19,219,080 bytes (76.9%)

free 5,785,364 bytes (23.1%)

GCs 7 (99ms between GCs)

full 2 totalling 129ms (18.7% uptime), avg 64.5ms

incr 5 totalling 9ms (1.3% uptime), avg 1.8ms

tenures 0

allocating: 1500 times 1024000 bytes

===============================================================================

Notice: Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st

===============================================================================

Errors in script loaded from /Users/philippeback/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest/memtest.st

==== Startup Error: OutOfMemory

ByteArray class(Behavior)>>basicNew:

ByteArray class(Behavior)>>new:

UndefinedObject>>DoIt in Block: [t6...

SmallInteger(Integer)>>timesRepeat:

UndefinedObject>>DoIt in Block: [:t6 | t1...

Array class(SequenceableCollection class)>>new:streamContents:

UndefinedObject>>DoIt in Block: [t4 := Array...

Time class>>millisecondsToRun:

BlockClosure>>timeToRun

UndefinedObject>>DoIt

Compiler>>evaluate:in:to:notifying:ifFail:logged:

Compiler class>>evaluate:for:notifying:logged:

Compiler class>>evaluate:for:logged:

Compiler class>>evaluate:logged:

DoItDeclaration>>import

CodeImporter>>evaluate in Block: [:decl | value := decl import]

OrderedCollection>>do:

CodeImporter>>evaluate

BasicCodeLoader>>installSourceFile: in Block: [codeImporter evaluate]

BlockClosure>>on:do:

BasicCodeLoader>>handleErrorsDuring:reference:

BasicCodeLoader>>installSourceFile:

BasicCodeLoader>>installSourceFiles in Block: [:reference | self installSourceFile: reference]

OrderedCollection>>do:

BasicCodeLoader>>installSourceFiles in Block: [sourceFiles...

BlockClosure>>ensure:

BasicCodeLoader>>installSourceFiles

BasicCodeLoader>>activate

BasicCodeLoader class(CommandLineHandler class)>>activateWith:

DefaultCommandLineHandler>>handleSubcommand

out of memory

Smalltalk stack dump:

0xbffd3f38 M BlockClosure>ifError: 0x7aad665c: a(n) BlockClosure

0xbffd3f5c M [] in Semaphore>critical:ifError: 0x1f62f350: a(n) Semaphore

0xbffd3f7c M [] in Semaphore>critical: 0x1f62f350: a(n) Semaphore

0xbffd3f9c M BlockClosure>ensure: 0x7aad6578: a(n) BlockClosure

0xbffb15c4 M Semaphore>critical: 0x1f62f350: a(n) Semaphore

0xbffb15e4 M Semaphore>critical:ifError: 0x1f62f350: a(n) Semaphore

0xbffb1604 M WeakRegistry>protected: 0x1f62f318: a(n) WeakRegistry

0xbffb162c M WeakRegistry>finalizeValues 0x1f62f318: a(n) WeakRegistry

0xbffb1648 M [] in WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class

0xbffb1664 M BlockClosure>on:do: 0x7aad6310: a(n) BlockClosure

0xbffb1684 M BlockClosure>on:fork: 0x7aad6310: a(n) BlockClosure

0xbffb16a4 M [] in WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class

0xbffb16c8 M WeakArray(SequenceableCollection)>do: 0x1f502290: a(n) WeakArray

0xbffb16e4 M [] in WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class

0xbffb1704 M [] in Semaphore>critical: 0x2074d594: a(n) Semaphore

0xbffb1724 M BlockClosure>ensure: 0x7aad621c: a(n) BlockClosure

0xbffb1744 M Semaphore>critical: 0x2074d594: a(n) Semaphore

0xbffb1760 M WeakArray class>finalizationProcess 0x1f7fd4e0: a(n) WeakArray class

0xbffb1780 I [] in WeakArray class>restartFinalizationProcess 0x1f7fd4e0: a(n) WeakArray class

0xbffb17a0 I [] in BlockClosure>newProcess 0x2074d600: a(n) BlockClosure

Most recent primitives

shallowCopy

wait

signal

wait

signal

wait

signal

basicNew:

basicNew

class

at:put:

wait

shallowCopy

wait

signal

wait

signal

wait

signal

basicNew:

basicNew

class

at:put:

wait

shallowCopy

wait

signal

wait

signal

wait

signal

basicNew:

basicNew

class

at:put:

wait

shallowCopy

wait

./pharo: line 11: 41423 Abort trap: 6 "$DIR"/"pharo-vm/Pharo.app/Contents/MacOS/Pharo" --headless "$@"

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$]

---

Philippe Back

Dramatic Performance Improvements

Mob: +32(0) 478 650 140 | Fax: +32 (0) 70 408 027

[hidden email] | Web: http://philippeback.eu

Blog: http://philippeback.be | Twitter: @philippeback

Youtube: http://www.youtube.com/user/philippeback/videos

High Octane SPRL

rue cour Boisacq 101 | 1301 Bierges | Belgium

Featured on the Software Process and Measurement Cast

http://spamcast.libsyn.com

Sparx Systems Enterprise Architect and Ability Engineering EADocX Value Added Reseller

On Sat, Jun 29, 2013 at 8:22 PM, Jan Vrany <[hidden email]> wrote:

On 29/06/13 17:45, Sven Van Caekenberghe wrote:

Hi Jan,

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan

Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:

Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)

Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
data := Array new: count streamContents: [ :out |
count timesRepeat: [
out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.

And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,624 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,219,008 bytes (76.60000000000001%)
free 5,857,700 bytes (23.400000000000002%)
GCs 10 (12ms between GCs)
full 1 totalling 30ms (25.0% uptime), avg 30.0ms
incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
tenures 0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime 0h0m3s
memory 546,746,364 bytes
old 543,520,476 bytes (99.4%)
young 7,184 bytes (0.0%)
used 543,527,660 bytes (99.4%)
free 3,218,704 bytes (0.6000000000000001%)
GCs 356 (9ms between GCs)
full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
tenures 0
Since last view 346 (9ms between GCs)
uptime 3.1s
full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
tenures 0

It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime 0h0m0s
memory 25,076,716 bytes
old 19,213,480 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,864 bytes (76.60000000000001%)
free 5,857,852 bytes (23.400000000000002%)
GCs 12 (13ms between GCs)
full 2 totalling 56ms (37.1% uptime), avg 28.0ms
incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime 0h0m14s
memory 533,946,384 bytes
old 531,232,232 bytes (99.5%)
young 7,644 bytes (0.0%)
used 531,239,876 bytes (99.5%)
free 2,706,508 bytes (0.5%)
GCs 1,011 (14ms between GCs)
full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
tenures 0
Since last view 999 (14ms between GCs)
uptime 13.9s
full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
tenures 0

So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,592 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,976 bytes (76.60000000000001%)
free 5,857,732 bytes (23.400000000000002%)
GCs 12 (12ms between GCs)
full 2 totalling 55ms (37.9% uptime), avg 27.5ms
incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime 0h0m35s
memory -783,340,292 bytes
old -786,061,544 bytes (100.30000000000001%)
young 7,460 bytes (0.0%)
used -786,054,084 bytes (100.30000000000001%)
free 2,713,792 bytes (-0.30000000000000004%)
GCs 2,572 (14ms between GCs)
full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
tenures 0
Since last view 2,560 (14ms between GCs)
uptime 35.6s
full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
tenures 0

The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.

Sven

philippeback

Re: [Pharo-dev] Exploring Heap Size Limits

In reply to this post by Jan Vrany

Other than that it is possible to go quite far:

[PhilMac:~/Documents/Smalltalk/2-MyWorkspaces/workspaceMemTest philippeback$] ./pharo Pharo.image memtest.st 1400 1024000

memtest.st

uptime 0h0m0s

memory 25,004,460 bytes

old 19,212,804 bytes (76.80000000000001%)

young 4,584 bytes (0.0%)

used 19,217,388 bytes (76.9%)

free 5,787,072 bytes (23.1%)

GCs 7 (100ms between GCs)

full 2 totalling 132ms (18.8% uptime), avg 66.0ms

incr 5 totalling 9ms (1.3% uptime), avg 1.8ms

tenures 0

allocating: 1400 times 1024000 bytes

allocated: 1.43 GB

time to run 0:00:01:21.708

3 times GC

time to run 0:00:00:00.944

uptime 0h1m23s

memory -691,912,800 bytes

old -694,637,076 bytes (100.4%)

young 7,184 bytes (0.0%)

used -694,629,892 bytes (100.4%)

free 2,717,092 bytes (-0.4%)

GCs 2,806 (30ms between GCs)

full 1401 totalling 82,194ms (98.60000000000001% uptime), avg 58.7ms

incr 1405 totalling 209ms (0.30000000000000004% uptime), avg 0.1ms

tenures 0

Since last view 2,799 (30ms between GCs)

uptime 82.7s

full 1399 totalling 82,062ms (99.30000000000001% uptime), avg 58.7ms

incr 1400 totalling 200ms (0.2% uptime), avg 0.1ms

tenures 0

On 29 Jun 2013, at 17:12, Jan Vrany <[hidden email]> wrote:

Hi,

for 32bit application you can never allocate whole 4GB as some memory has to be reserved for kernel. The usual split is 2/2, sometimes 3/1. Plus,
the actual code must be somewhere in the memory as well as the stack.
Usually the real limit for 32bit application is somewhere like 1.8GB for
heap. With 3/1 split you can have one more gig, but for linux that would
require a 3/1 support in the kernel. If I'm not mistaken most stock kernels come with 2/2 split.

Jan

Thanks for the reply. Sounds pretty reasonable.

Would this also be the case for a 32-bit app running on a 64-bit OS ?

If I'm not much mistaken, yes. The crucial point are system calls which need to pass data to the kernel (such as read()/write() to name the most
obvious ones). At some point the process is in funny state, memory-wise.
It's in kernel mode already running the system call but CPU has still
user-processes' page table active. At this point data are copied from
user memory to kernel memory (to those reserved 2gigs, usually memory above 0x7FFFFFFF) Then the kernel install its own, kernel page table which maps some memory the very same physical page which was mapped in user process's page table to the address where the data were just copied. That's how the data are transferred between user space and kernel space. Or, at least that's my understanding how it's done :-) I was never particularly good at these low-level things.

Best, Jan

Sven

On 28/06/13 23:02, Sven Van Caekenberghe wrote:

Hi,

I wrote a little test to explore the heap size limits of the current Pharo image+VM combination. I did some tests on a 8GB 64-bit Linux machine and the basic conclusion is that using a standard 1GB heap size you can allocate and GC with close to 1GB of actual data. I think that is pretty good news. The machine has 4 effective cores, so could easily run 4 busy images like that.

Pushing the limits further, things get weird, up to the point of crashing. Sadly 2, 3 or 4 GB heap size, which should be theoretically possible, fail, thus limiting the amount of data that can be allocated and used.

Anyway, my goal is to start the debate around this topic ;-)

Here is the test script:

t3@coolermaster:smalltalk$ cat memtest.st

NonInteractiveTranscript stdout install.

!

Transcript show: 'memtest.st'; cr.
Smalltalk garbageCollect.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

| count size data timeToRun |
count := Smalltalk commandLine arguments first asInteger.
size := Smalltalk commandLine arguments second asInteger.
Transcript show: ('allocating: {1} times {2} bytes' format: { count. size }); cr.
timeToRun := [
data := Array new: count streamContents: [ :out |
count timesRepeat: [
out nextPut: (ByteArray new: size) ] ].
] timeToRun.
Transcript
show: ('allocated: {1}' format: { (data collect: #size) sum humanReadableSIByteSize });
cr.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: '3 times GC'; cr.
timeToRun := [
3 timesRepeat: [ Smalltalk garbageCollect ] ] timeToRun.
Transcript show: ('time to run {1}' format: { Duration milliSeconds: timeToRun }); cr.
Transcript show: SmalltalkImage current vm statisticsReport; cr.

!

SmalltalkImage current quitPrimitive.

And here is a normal result running get.pharo.org/20+vm

$ ./pharo Pharo.image memtest.st 512 1024000

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,624 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,219,008 bytes (76.60000000000001%)
free 5,857,700 bytes (23.400000000000002%)
GCs 10 (12ms between GCs)
full 1 totalling 30ms (25.0% uptime), avg 30.0ms
incr 9 totalling 6ms (5.0% uptime), avg 0.7000000000000001ms
tenures 0

allocating: 512 times 1024000 bytes
allocated: 524.29 MB
time to run 0:00:00:02.852
3 times GC
time to run 0:00:00:00.196
uptime 0h0m3s
memory 546,746,364 bytes
old 543,520,476 bytes (99.4%)
young 7,184 bytes (0.0%)
used 543,527,660 bytes (99.4%)
free 3,218,704 bytes (0.6000000000000001%)
GCs 356 (9ms between GCs)
full 89 totalling 2,743ms (86.4% uptime), avg 30.8ms
incr 267 totalling 223ms (7.0% uptime), avg 0.8ms
tenures 0
Since last view 346 (9ms between GCs)
uptime 3.1s
full 88 totalling 2,713ms (88.80000000000001% uptime), avg 30.8ms
incr 258 totalling 217ms (7.1000000000000005% uptime), avg 0.8ms
tenures 0

It takes about 2.5s to allocate about 512Mb. A GC then takes 30ms.

You can play a bit with the 2 parameters to generate different datasets.

Not changing any VM parameters, you can successfully do something like

$ ./pharo Pharo.image memtest.st 1000 1024000

or

$ ./pharo Pharo.image memtest.st 1024000 1000

thus allocating about 1GB.

Playing with -mmap gives some weird results and unexpected slowness as well. Consider (the default -mmap is 1024m):

$ ./pharo -mmap 1280m Pharo.image memtest.st 500 1024000

memtest.st
uptime 0h0m0s
memory 25,076,716 bytes
old 19,213,480 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,864 bytes (76.60000000000001%)
free 5,857,852 bytes (23.400000000000002%)
GCs 12 (13ms between GCs)
full 2 totalling 56ms (37.1% uptime), avg 28.0ms
incr 10 totalling 6ms (4.0% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 500 times 1024000 bytes
allocated: 512.00 MB
time to run 0:00:00:13.66
3 times GC
time to run 0:00:00:00.192
uptime 0h0m14s
memory 533,946,384 bytes
old 531,232,232 bytes (99.5%)
young 7,644 bytes (0.0%)
used 531,239,876 bytes (99.5%)
free 2,706,508 bytes (0.5%)
GCs 1,011 (14ms between GCs)
full 501 totalling 13,819ms (98.7% uptime), avg 27.6ms
incr 510 totalling 35ms (0.2% uptime), avg 0.1ms
tenures 0
Since last view 999 (14ms between GCs)
uptime 13.9s
full 499 totalling 13,763ms (99.30000000000001% uptime), avg 27.6ms
incr 500 totalling 29ms (0.2% uptime), avg 0.1ms
tenures 0

So increasing the heap from 1024 Mb to 1280 Mb slows the allocation down to 14s.

Increasing the heap size to 1792 Mb gives the following result:

t3@coolermaster:smalltalk$ ./pharo -mmap 1792m Pharo.image memtest.st 1280 1048576

memtest.st
uptime 0h0m0s
memory 25,076,708 bytes
old 19,213,592 bytes (76.60000000000001%)
young 5,384 bytes (0.0%)
used 19,218,976 bytes (76.60000000000001%)
free 5,857,732 bytes (23.400000000000002%)
GCs 12 (12ms between GCs)
full 2 totalling 55ms (37.9% uptime), avg 27.5ms
incr 10 totalling 6ms (4.1000000000000005% uptime), avg 0.6000000000000001ms
tenures 0

allocating: 1280 times 1048576 bytes
allocated: 1.34 GB
time to run 0:00:00:35.258
3 times GC
time to run 0:00:00:00.37
uptime 0h0m35s
memory -783,340,292 bytes
old -786,061,544 bytes (100.30000000000001%)
young 7,460 bytes (0.0%)
used -786,054,084 bytes (100.30000000000001%)
free 2,713,792 bytes (-0.30000000000000004%)
GCs 2,572 (14ms between GCs)
full 1281 totalling 35,444ms (99.10000000000001% uptime), avg 27.700000000000003ms
incr 1291 totalling 93ms (0.30000000000000004% uptime), avg 0.1ms
tenures 0
Since last view 2,560 (14ms between GCs)
uptime 35.6s
full 1279 totalling 35,389ms (99.30000000000001% uptime), avg 27.700000000000003ms
incr 1281 totalling 87ms (0.2% uptime), avg 0.1ms
tenures 0

The allocation takes a lot longer, but the GC times are stable.

I suspect that negative reporting has to do with SmallInteger limitations, but the question is if it is only a cosmetic problem or not.

Again, I am not complaining, just exploring/wondering.

Sven