I created a pharo entry to track the problem the VM & image has in wanting to visit every smalltalk object multiple times at startup time. Athought this behavior is masked by Gigaherz processors, it's very evident as a problem on the iPhone. Fixing it results in reducing MB of RAM memory usage and saves actual "seconds* of clock time at startup. http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200 -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
On 14.04.2009, at 07:26, John M McIntosh wrote: > I created a pharo entry to track the problem the VM & image has in > wanting to visit every smalltalk object multiple times at startup > time. > Athought this behavior is masked by Gigaherz processors, it's very > evident as a problem on the iPhone. Fixing it results in reducing > MB of RAM memory usage and saves actual "seconds* of clock time at > startup. > > http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200 Very nice. We experimented in that direction for OLPC which also is comparatively slow CPU wise, and even slower loading the whole image from the flash disk (which involves decompressing). Mmapping only the pages needed should give a considerable boost. Do we have evidence that an mmap base address of 500 MB works across platforms? - Bert - |
On Tue, Apr 14, 2009 at 10:58:15AM +0200, Bert Freudenberg wrote: > > On 14.04.2009, at 07:26, John M McIntosh wrote: > > >I created a pharo entry to track the problem the VM & image has in > >wanting to visit every smalltalk object multiple times at startup > >time. > >Athought this behavior is masked by Gigaherz processors, it's very > >evident as a problem on the iPhone. Fixing it results in reducing > >MB of RAM memory usage and saves actual "seconds* of clock time at > >startup. > > > >http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200 > > Very nice. We experimented in that direction for OLPC which also is > comparatively slow CPU wise, and even slower loading the whole image > from the flash disk (which involves decompressing). Mmapping only the > pages needed should give a considerable boost. > > Do we have evidence that an mmap base address of 500 MB works across > platforms? Well, there's only one way to find out ;) AFAIK nobody has tried this on Linux yet, but it should not be hard to do. The necessary support is in VMMaker already (see Mantis 7233), so it would be a matter of changing the unix support code, adding two functions and updating configure to point at the new functions. You can get an idea of what's involved from the change set preamble: Change Set: Interpreter-readImageFromFile-jmm-dtl Date: 23 October 2008 Pass image file and image header length to object memory allocation function in order to enable mmap loading without address swizzling, per jmm proposal. CCodeGenerator will provide default implementations in interp.h that are backward compatible with the existing platform support code. These defaults may be overridden by adding definitions such as the following to sqConfig.h (or config.h via configure for the unix VM). #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) \ myMemoryAllocator(heapSize, minimumMemory, fileStream, headerSize) #define sqImageFileReadEntireImage(memoryAddress, fileStream, elementSize, length) \ myImageFileReader(memoryAddress, fileStream, elementSize, length) |
In reply to this post by Bert Freudenberg
Yes, so WikiServer ( http://www.mobilewikiserver.com ) is an image file of 10.5 MB. As startup only 4.5 MB of OOPS memory is faulted in, and about 700K of memory is altered, which reduces initial memory use by 6MB and reduces the startup time by 3+ seconds. Given the slow speed of the iPhone, and the fact I've a 64MB limit, 6MB is a lot, and 3 seconds is welcome. Unfortunately a full GC will fault all that 10.5 MB in, however by doing some GC tuning one can avoid a full GC until things are quite stressed. I note on os-x desktop machines the entire 10.5MB is read in and the pages marked as non-referenced, but obviously the rules for the virtual memory subsystem are different. First let me suggest we change writeImageFileIO /* header size in bytes; do not change! */ headerSize = 64; f = sqImageFileOpen(imageName, "wb"); from 64 bytes to 4096 bytes, if possible. Now let's explore why, and what is going on. on unix if you decide to use mmap versus malloc to allocate storage for oops space it does #define MAP_PROT (PROT_READ | PROT_WRITE) #define MAP_FLAGS (MAP_ANON | MAP_PRIVATE) mmap(0, heapLimit, MAP_PROT, MAP_FLAGS, devZero, 0) where heapLimit is usually 1GB, start location zero. This returns a start location somewhere in memory, never zero, and generally we have to swizzle all the object pointer. On a save and restart of the squeak app the address you get back *maybe* the same address, if it is the same we don't need to swizzle pointers, however OpenBSD based systems likely will always give a different location for security reasons. I had then set a start location of 128MB, but found on os-x as the number of apps goes up you don't get 128MB, so I settled on 500MB which seems ok. 8GB pro macs running 52 applications fail at 500MB, but the failure is it chooses it's own address, so we don't care... Well yes that limits your squeak image to 3.5GB but it's doubtful that a 32bit system will let you allocate a contiguous chunk of memory > 2 GB anyway. Now the next issue was the original memory allocation logic would give you the 1GB, and you would read the entire image into that memory area. In thinking about this I thought why can't you mmap to the image file for the size of the file rounded up to the page size, then mmap after that memory to anonymous memory upto the desired heapsize. So two mmaps, one for the file, followed by another for young space. I implemented this for the os-x vm and the iPhone VM. In testing with a 500 Mhz powerpc laptop, I found the startup time was reduced by 30% because it would fault in say a 20MB page by page as it did the needless flush primitive calls logic, versus reading the 20mb into memory, the virtual memory pager was just more efficient at pullling in the data either by better I/O processing, or faster logic in finding the free pages. Problems: It turns out there is a flaw in the OS-X BSD mmap logic when you mmap files on NFS drives, it hangs, and some people with I think overstressed systems reported issues with the first mmap failing. Because of this I reverted back to the old logic by default, and put a flag in the info.plist SqueakUseFileMappedMMAP to enable the new logic. Obviously for Linux you have to decide if this flaw exists and has not been fixed? Now the problem with headerSize In the file mmap case the entire file is mapped into memory at 500MB, but the oops space starts at 64 bytes, so memory is at 500MB+64 In the anonymous mmap case memory starts at 500MB, but the oops space starts at 0, so memory is at 500MB. If the headersize was 4096 we could mmap the file at 500MB-4096, Or alter the anonymous case we could allocate at 500MB but stick the oops space at 500MB +64 (header size). However by using a headerSize of 4096 we can then get the oops space to start on a page boundary, which may or may not have implications. Anyway it would be good to resolve this bit of tricky logic. I stuck the following code into ioRelinquishProcessorForMicroseconds() since ioRelinquishProcessorForMicroseconds will only get triggered once the image finishes all it's startup logic and becomes *idle*. So that I could determine how each page was viewed by the virtual memory subsystem. xtern unsigned char *memory; extern usqInt sqGetAvailableMemory(); extern size_t fileRoundedUpToPageSize; size_t pageSize= getpagesize(); size_t vmpagesize=sqGetAvailableMemory()/pageSize + 1; char *what = malloc(vmpagesize); int err = mincore(memory, sqGetAvailableMemory(), what); int countRef=0, countMod=0,countZero=0, countOne=0, i; for (i=0;i<fileRoundedUpToPageSize/pageSize;i++) { if(what[i] == 0) countZero++; if(what[i] == 1) countOne++; if(what[i] == 3) countRef++; if(what[i] == 7) countMod++; {break for debugging here} } free(what); On 14-Apr-09, at 1:58 AM, Bert Freudenberg wrote: > > On 14.04.2009, at 07:26, John M McIntosh wrote: > >> I created a pharo entry to track the problem the VM & image has in >> wanting to visit every smalltalk object multiple times at startup >> time. >> Athought this behavior is masked by Gigaherz processors, it's very >> evident as a problem on the iPhone. Fixing it results in reducing >> MB of RAM memory usage and saves actual "seconds* of clock time at >> startup. >> >> http://code.google.com/p/pharo/issues/detail?id=737&colspec=ID%20Type%20Status%20Summary%20Milestone&start=200 > > Very nice. We experimented in that direction for OLPC which also is > comparatively slow CPU wise, and even slower loading the whole image > from the flash disk (which involves decompressing). Mmapping only > the pages needed should give a considerable boost. > > Do we have evidence that an mmap base address of 500 MB works across > platforms? > > - Bert - > -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
Well a PermSpace would be helpful since it would then avoid a GC which causes the header bits to change making all pages as written, and if the pages are written then the virtual memory system can't free the page under stress. However the real benefit here was that I actually never touch 6MB of objects, so they are never moved from Flash into dynamic RAM. So I do not need to find 1,600 pages of ram to hold pages which the application doesn't need. If I did an allInstances it would transverse PermSpace and bring all those 1600 pages into RAM, really want to avoid that. For example if the iPhone receives a phone call, it starts up a daemon to display a dialog asking if you want to answer or ignore, memory for that has to come from somewhere, or a double-click of the home key can bring up the iPod dialog prompt if the iPod audio app is running, again pages have to come from somewhere. With 700K of old objects, and about 5MB for young space it means the 4.5MB of image objects that is faulted in but not written, can be freed since a page fault will re-read them from the image file. On 14-Apr-09, at 11:02 AM, Nicolas Cellier wrote: > 2009/4/14 John M McIntosh <[hidden email]>: >> Yes, so WikiServer ( http://www.mobilewikiserver.com ) is an image >> file of 10.5 MB. >> >> As startup only 4.5 MB of OOPS memory is faulted in, and about 700K >> of >> memory is altered, which reduces initial memory use by 6MB and >> reduces >> the startup time by 3+ seconds. >> Given the slow speed of the iPhone, and the fact I've a 64MB limit, >> 6MB is a lot, and 3 seconds is welcome. Unfortunately a full GC will >> fault all that 10.5 MB in, however by doing some GC tuning one >> can avoid a full GC until things are quite stressed. >> > > Wouldn't a VW-like PermSpace reduce the burden? > > Nicolas -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
Free forum by Nabble | Edit this page |