[squeak-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[squeak-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

johnmci
I created a pharo entry to track the problem the VM & image has in  
wanting to visit every smalltalk object multiple times at startup time.
Athought this behavior is masked by Gigaherz processors, it's very  
evident as a problem on the iPhone. Fixing it results in reducing
MB of RAM memory usage and saves actual "seconds* of clock time at  
startup.

http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200
--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: [Pharo-project] hampering the desire of the VM and image to visit every object at startup time (multiple times)

Stéphane Ducasse
thanks

Stef

On Apr 14, 2009, at 7:26 AM, John M McIntosh wrote:

> I created a pharo entry to track the problem the VM & image has in
> wanting to visit every smalltalk object multiple times at startup  
> time.
> Athought this behavior is masked by Gigaherz processors, it's very
> evident as a problem on the iPhone. Fixing it results in reducing
> MB of RAM memory usage and saves actual "seconds* of clock time at
> startup.
>
> http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200
> --
> =
> =
> =
> =
> =
> ======================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd.  http://
> www.smalltalkconsulting.com
> =
> =
> =
> =
> =
> ======================================================================
>
>
>
>
> _______________________________________________
> Pharo-project mailing list
> [hidden email]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: [Vm-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

Bert Freudenberg
In reply to this post by johnmci

On 14.04.2009, at 07:26, John M McIntosh wrote:

> I created a pharo entry to track the problem the VM & image has in  
> wanting to visit every smalltalk object multiple times at startup  
> time.
> Athought this behavior is masked by Gigaherz processors, it's very  
> evident as a problem on the iPhone. Fixing it results in reducing
> MB of RAM memory usage and saves actual "seconds* of clock time at  
> startup.
>
> http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200

Very nice. We experimented in that direction for OLPC which also is  
comparatively slow CPU wise, and even slower loading the whole image  
from the flash disk (which involves decompressing). Mmapping only the  
pages needed should give a considerable boost.

Do we have evidence that an mmap base address of 500 MB works across  
platforms?

- Bert -


Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: [Vm-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

johnmci
Yes, so WikiServer  ( http://www.mobilewikiserver.com ) is an image  
file of 10.5 MB.

As startup only 4.5 MB of OOPS memory is faulted in, and about 700K of  
memory is altered, which reduces initial memory use by 6MB and reduces  
the startup time by 3+ seconds.
Given the slow speed of the iPhone, and the fact I've a 64MB limit,  
6MB is a lot, and 3 seconds is welcome. Unfortunately a full GC will  
fault all that 10.5 MB in, however by doing some GC tuning one
can avoid a full GC until things are quite stressed.

I note on os-x desktop machines the entire 10.5MB is read in and the  
pages marked as non-referenced, but obviously the rules for the  
virtual memory subsystem are different.


First let me suggest we change

writeImageFileIO


        /* header size in bytes; do not change! */

        headerSize = 64;
        f = sqImageFileOpen(imageName, "wb");


from 64 bytes to 4096 bytes, if possible.

Now let's explore why, and what is going on.

on unix if you decide to use mmap versus malloc to allocate storage  
for oops space it does

#define MAP_PROT (PROT_READ | PROT_WRITE)
#define MAP_FLAGS (MAP_ANON | MAP_PRIVATE)
  mmap(0, heapLimit, MAP_PROT, MAP_FLAGS, devZero, 0)


where heapLimit is usually 1GB, start location zero.

This returns a start location somewhere in memory, never zero, and  
generally we have to swizzle all the object pointer. On a save and  
restart of the squeak app the address
you get back *maybe* the same address, if it is the same we don't need  
to swizzle pointers, however OpenBSD based systems likely will always  
give a different location for security reasons.

I had then set a start location of 128MB, but found on os-x as the  
number of apps goes up you don't get 128MB, so I settled on 500MB which
seems ok. 8GB pro macs running 52 applications fail at 500MB, but the  
failure is it chooses it's own address, so we don't care...

Well yes that limits your squeak image to 3.5GB but it's doubtful that  
a 32bit system will let you allocate a contiguous chunk of memory > 2  
GB anyway.

Now the next issue was the original memory allocation logic would give  
you the 1GB, and you would read the entire image into that memory area.

In thinking about this I thought why can't you mmap to the image file  
for the size of the file rounded up to the page size, then mmap after  
that memory
to anonymous memory upto the desired heapsize.

So two mmaps, one for the file, followed by another for young space.

I implemented this for the os-x vm and the iPhone VM.

In testing with a 500 Mhz powerpc laptop, I found the startup time was  
reduced by 30% because it would fault in say a 20MB page by page as it  
did the
needless flush primitive calls logic, versus reading the 20mb into  
memory, the virtual memory pager was just more efficient at pullling  
in the data either by better
I/O processing, or faster logic in finding the free pages.

Problems:

It turns out there is a flaw in the OS-X BSD mmap logic when you mmap  
files on NFS drives, it hangs, and some people with I think  
overstressed systems reported issues with the first mmap failing.
Because of this I reverted back to the old logic by default, and put a  
flag in the info.plist SqueakUseFileMappedMMAP to enable the new logic.
Obviously for Linux you have to decide if this flaw exists and has not  
been fixed?

Now the problem with headerSize

In the file mmap case the entire file is mapped into memory at 500MB,  
but the oops space starts at 64 bytes, so memory is at 500MB+64
In the anonymous mmap case memory starts at 500MB, but the oops space  
starts at 0, so memory is at 500MB.

If the headersize was 4096 we could mmap the file at 500MB-4096,
Or alter the anonymous case we could allocate at 500MB but stick the  
oops space at 500MB +64 (header size).

However by using a headerSize of 4096 we can then get the oops space  
to start on a page boundary, which may or may not have implications.
Anyway it would be good to resolve this bit of tricky logic.

I stuck the following code into ioRelinquishProcessorForMicroseconds()  
since ioRelinquishProcessorForMicroseconds will only get triggered  
once the
image finishes all it's startup logic and becomes *idle*. So that I  
could determine how each page was viewed by the virtual memory  
subsystem.


xtern unsigned char *memory;
        extern usqInt sqGetAvailableMemory();
        extern size_t fileRoundedUpToPageSize;
        size_t pageSize= getpagesize();
        size_t vmpagesize=sqGetAvailableMemory()/pageSize + 1;
        char *what = malloc(vmpagesize);
        int err = mincore(memory, sqGetAvailableMemory(), what);
        int countRef=0, countMod=0,countZero=0, countOne=0, i;
        for (i=0;i<fileRoundedUpToPageSize/pageSize;i++) {
                if(what[i] == 0) countZero++;
                if(what[i] == 1) countOne++;
                if(what[i] == 3) countRef++;
                if(what[i] == 7) countMod++;
                {break for debugging here}
        }
                                               
        free(what);




On 14-Apr-09, at 1:58 AM, Bert Freudenberg wrote:

>
> On 14.04.2009, at 07:26, John M McIntosh wrote:
>
>> I created a pharo entry to track the problem the VM & image has in  
>> wanting to visit every smalltalk object multiple times at startup  
>> time.
>> Athought this behavior is masked by Gigaherz processors, it's very  
>> evident as a problem on the iPhone. Fixing it results in reducing
>> MB of RAM memory usage and saves actual "seconds* of clock time at  
>> startup.
>>
>> http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200
>
> Very nice. We experimented in that direction for OLPC which also is  
> comparatively slow CPU wise, and even slower loading the whole image  
> from the flash disk (which involves decompressing). Mmapping only  
> the pages needed should give a considerable boost.
>
> Do we have evidence that an mmap base address of 500 MB works across  
> platforms?
>
> - Bert -
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: [Vm-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

Eliot Miranda-2


On Tue, Apr 14, 2009 at 10:04 AM, John M McIntosh <[hidden email]> wrote:
Yes, so WikiServer  ( http://www.mobilewikiserver.com ) is an image file of 10.5 MB.

As startup only 4.5 MB of OOPS memory is faulted in, and about 700K of memory is altered, which reduces initial memory use by 6MB and reduces the startup time by 3+ seconds.
Given the slow speed of the iPhone, and the fact I've a 64MB limit, 6MB is a lot, and 3 seconds is welcome. Unfortunately a full GC will fault all that 10.5 MB in, however by doing some GC tuning one
can avoid a full GC until things are quite stressed.

I note on os-x desktop machines the entire 10.5MB is read in and the pages marked as non-referenced, but obviously the rules for the virtual memory subsystem are different.


First let me suggest we change

writeImageFileIO


       /* header size in bytes; do not change! */

       headerSize = 64;
       f = sqImageFileOpen(imageName, "wb");


from 64 bytes to 4096 bytes, if possible.

Now let's explore why, and what is going on.

on unix if you decide to use mmap versus malloc to allocate storage for oops space it does

#define MAP_PROT        (PROT_READ | PROT_WRITE)
#define MAP_FLAGS       (MAP_ANON | MAP_PRIVATE)
 mmap(0, heapLimit, MAP_PROT, MAP_FLAGS, devZero, 0)


where heapLimit is usually 1GB, start location zero.

This returns a start location somewhere in memory, never zero, and generally we have to swizzle all the object pointer. On a save and restart of the squeak app the address
you get back *maybe* the same address, if it is the same we don't need to swizzle pointers, however OpenBSD based systems likely will always give a different location for security reasons.

I had then set a start location of 128MB, but found on os-x as the number of apps goes up you don't get 128MB, so I settled on 500MB which
seems ok. 8GB pro macs running 52 applications fail at 500MB, but the failure is it chooses it's own address, so we don't care...

Well yes that limits your squeak image to 3.5GB but it's doubtful that a 32bit system will let you allocate a contiguous chunk of memory > 2 GB anyway.

Now the next issue was the original memory allocation logic would give you the 1GB, and you would read the entire image into that memory area.

In thinking about this I thought why can't you mmap to the image file for the size of the file rounded up to the page size, then mmap after that memory
to anonymous memory upto the desired heapsize.

So two mmaps, one for the file, followed by another for young space.

I implemented this for the os-x vm and the iPhone VM.

In testing with a 500 Mhz powerpc laptop, I found the startup time was reduced by 30% because it would fault in say a 20MB page by page as it did the
needless flush primitive calls logic, versus reading the 20mb into memory, the virtual memory pager was just more efficient at pullling in the data either by better
I/O processing, or faster logic in finding the free pages.

Problems:

It turns out there is a flaw in the OS-X BSD mmap logic when you mmap files on NFS drives, it hangs, and some people with I think overstressed systems reported issues with the first mmap failing.
Because of this I reverted back to the old logic by default, and put a flag in the info.plist SqueakUseFileMappedMMAP to enable the new logic.
Obviously for Linux you have to decide if this flaw exists and has not been fixed?

Now the problem with headerSize

In the file mmap case the entire file is mapped into memory at 500MB, but the oops space starts at 64 bytes, so memory is at 500MB+64
In the anonymous mmap case memory starts at 500MB, but the oops space starts at 0, so memory is at 500MB.

John, I think this is a non-issue.  There is no reason to worry about the unused 64 bytes at the beginning of the image file.  Simply map it at 500Mb and have the heap start at 500Mb + 64.  So the header is mapped into memory.  This should make snapshot a little easier because one simply updates the header in place in memory.

In any case you're never going to find an ideal header size.  Some systems have very large page sizes and it would be silly to choose a header as large as, say, 1Mb.  Better to keep it on the small side and map it into memory where it won't be a problem.



If the headersize was 4096 we could mmap the file at 500MB-4096,
Or alter the anonymous case we could allocate at 500MB but stick the oops space at 500MB +64 (header size).

However by using a headerSize of 4096 we can then get the oops space to start on a page boundary, which may or may not have implications.

Nah.  64 bytes beyond a page boundary for the first few objects won't make any difference.   Don't sweat it.


Anyway it would be good to resolve this bit of tricky logic.

I stuck the following code into ioRelinquishProcessorForMicroseconds() since ioRelinquishProcessorForMicroseconds will only get triggered once the
image finishes all it's startup logic and becomes *idle*. So that I could determine how each page was viewed by the virtual memory subsystem.


xtern unsigned char *memory;
       extern usqInt   sqGetAvailableMemory();
       extern size_t fileRoundedUpToPageSize;
       size_t pageSize= getpagesize();
       size_t vmpagesize=sqGetAvailableMemory()/pageSize + 1;
       char *what = malloc(vmpagesize);
       int err = mincore(memory, sqGetAvailableMemory(), what);
       int countRef=0, countMod=0,countZero=0, countOne=0, i;
       for (i=0;i<fileRoundedUpToPageSize/pageSize;i++) {
               if(what[i] == 0) countZero++;
               if(what[i] == 1) countOne++;
               if(what[i] == 3) countRef++;
               if(what[i] == 7) countMod++;
               {break for debugging here}
       }
                                               
       free(what);





On 14-Apr-09, at 1:58 AM, Bert Freudenberg wrote:


On 14.04.2009, at 07:26, John M McIntosh wrote:

I created a pharo entry to track the problem the VM & image has in wanting to visit every smalltalk object multiple times at startup time.
Athought this behavior is masked by Gigaherz processors, it's very evident as a problem on the iPhone. Fixing it results in reducing
MB of RAM memory usage and saves actual "seconds* of clock time at startup.

http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200

Very nice. We experimented in that direction for OLPC which also is comparatively slow CPU wise, and even slower loading the whole image from the flash disk (which involves decompressing). Mmapping only the pages needed should give a considerable boost.

Do we have evidence that an mmap base address of 500 MB works across platforms?

- Bert -


--
===========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================







Reply | Threaded
Open this post in threaded view
|

[squeak-dev] Re: [Pharo-project] [Vm-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

johnmci
In reply to this post by johnmci
Well a PermSpace would be helpful since it would then avoid a GC which  
causes the header bits to change making all pages as
written, and if the pages are written then the virtual memory system  
can't free the page under stress.

However the real benefit here was that I actually never touch 6MB of  
objects, so they are never moved from Flash into
dynamic RAM. So I do not need to find 1,600 pages of ram to hold pages  
which the application doesn't need.
If I did an allInstances it would transverse PermSpace and bring all  
those 1600 pages into RAM, really want to avoid that.

For example if the iPhone receives a phone call, it starts up a daemon  
to display a dialog asking if you want to answer or ignore, memory for  
that
has to come from somewhere, or a double-click of the home key can  
bring up the iPod dialog prompt if the iPod audio app is running, again
pages have to come from somewhere.

With 700K of old objects, and about 5MB for young space it means the  
4.5MB of image objects that is faulted in but not written, can be  
freed since a page fault will
re-read them from the image file.

On 14-Apr-09, at 11:02 AM, Nicolas Cellier wrote:

> 2009/4/14 John M McIntosh <[hidden email]>:
>> Yes, so WikiServer  ( http://www.mobilewikiserver.com ) is an image
>> file of 10.5 MB.
>>
>> As startup only 4.5 MB of OOPS memory is faulted in, and about 700K  
>> of
>> memory is altered, which reduces initial memory use by 6MB and  
>> reduces
>> the startup time by 3+ seconds.
>> Given the slow speed of the iPhone, and the fact I've a 64MB limit,
>> 6MB is a lot, and 3 seconds is welcome. Unfortunately a full GC will
>> fault all that 10.5 MB in, however by doing some GC tuning one
>> can avoid a full GC until things are quite stressed.
>>
>
> Wouldn't a VW-like PermSpace reduce the burden?
>
> Nicolas

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: [squeak-dev] Re: [Vm-dev] hampering the desire of the VM and image to visit every object at startup time (multiple times)

David Farber
In reply to this post by Eliot Miranda-2
On Apr 14, 2009, at 11:31 AM, Eliot Miranda wrote:
On Tue, Apr 14, 2009 at 10:04 AM, John M McIntosh <[hidden email]> wrote:
Now the problem with headerSize

In the file mmap case the entire file is mapped into memory at 500MB, but the oops space starts at 64 bytes, so memory is at 500MB+64
In the anonymous mmap case memory starts at 500MB, but the oops space starts at 0, so memory is at 500MB.

John, I think this is a non-issue.  There is no reason to worry about the unused 64 bytes at the beginning of the image file.  Simply map it at 500Mb and have the heap start at 500Mb + 64.  So the header is mapped into memory.  This should make snapshot a little easier because one simply updates the header in place in memory.

Actually, if the header was mapped into memory *and* kept up to date, it would make recovering images from core dumps trivial.

David