Apologies, accidentally posted to squeak-dev instead of vm-dev. - A. -------- Original Message -------- Subject: Image file loading Date: Tue, 19 Jan 2010 14:06:00 -0800 From: Andreas Raab <[hidden email]> Reply-To: The general-purpose Squeak developers list <[hidden email]> To: The general-purpose Squeak developers list <[hidden email]> Newsgroups: gmane.comp.lang.smalltalk.squeak.general Folks - I think that one of the more clever hacks I did for the Android VM deserves proper generalization in the Interpreter. Since on Android the image is part of the (compressed) application package it's not easily possible to pass a file handle into the VM and have the VM read the image as an external file. Instead, I'm preloading the image file into memory and then hacked an implementation of the sqImageFile* and sqAlloc* functions that would basically operate on the image in memory. I think we should formalize this idea for two reasons: a) It allows preloading the image into memory on platforms where the image may not be in an external file. b) It trivially allows mmap()ing the entire image into memory (I think John does that on the iPhone). What I'm proposing is an additional VM entry point, called Interpreter>>readImageFromHeap: heapStart size: heapSize which takes to arguments: The pointer to the beginning of the (pre-allocated) memory portion where the image file has been loaded or mapped and a size for the allocated memory portion. The VM then assumes that beginning at heapStart there is an image provided that it needs to properly prepare (i.e., perform byte-reversal, pointer adjustment etc). The VMs would generally use this along the lines of: memStart = self malloc(heapSize); fread(imageFile, memStart, fsize(imageFile)); readImageFromHeapSize(memStart, heapSize); and of course a backwards compatible version of readImageFromFile:HeapSize:StartingAt: could do the same (although I'd rather drop that function altogether since it removes the need for supporting various sqImageFile* functions). I'd also say that the same pattern should be used for *writing* image files, that is we assemble the entire image in memory then call sqImageFileWrite(fileName, heapStart, heapSize) and sqImageFileWrite is really just a combo of fopen(); fwrite(); fclose(); I think this scheme is much nicer, less complicated and offers more options for implementing image loading than what we currently have. Comments? Cheers, - Andreas |
On 2010-01-19, at 2:30 PM, Andreas Raab wrote: > b) It trivially allows mmap()ing the entire image into memory (I think > John does that on the iPhone). So how is this different that what happens now? In interp.h we have: #ifndef allocateMemoryMinimumImageFileHeaderSize /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize) #endif #ifndef sqImageFileReadEntireImage /* Called by Interpreter>>sqImage:read:size:length: */ #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream) #endif In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize); #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemoryMac(heapSize, minimumMemory, fileStream, headerSize) #ifdef BUILD_FOR_OSX size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) #else #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif The interp.c slang generated below on os-x the sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) does read the data from fileStream for the given length into memoryAddress. But on the iPhone we just return the length as per the #define above since the allocateMemoryMinimumImageFileHeaderSize has mmapped the file. slang... memory = allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, f, headerSize); if (memory == null) { insufficientMemoryAvailableError(); } ... sqImageFileSeek(f, headerStart + headerSize); /* begin sqImage:read:size:length: */ memoryAddress = pointerForOop(memory); bytesRead = sqImageFileReadEntireImage(memoryAddress, sizeof(unsigned char), dataSize, f); if (bytesRead != dataSize) { unableToReadImageError(); } -- =========================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com =========================================================================== |
John M McIntosh wrote: > On 2010-01-19, at 2:30 PM, Andreas Raab wrote: > >> b) It trivially allows mmap()ing the entire image into memory (I think >> John does that on the iPhone). > > So how is this different that what happens now? That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity. That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means: > #ifndef allocateMemoryMinimumImageFileHeaderSize Gets removed. > /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ > #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize) Gets removed. > #ifndef sqImageFileReadEntireImage Gets removed. > #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream) Gets removed. > In platform specfic I change that > > usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize); Gets removed. > #ifdef BUILD_FOR_OSX > size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); > #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) > #else > #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length > #endif Gets removed. Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more. Cheers, - Andreas |
For anonymous mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap allocate and read the file into the anonymous mmap readImageFromHeapSize does nothing. or for file based mmap I need fread(imageFile, memStart, fsize(imageFile)); to mmap the image file. So I need the readImageFromHeapSize for? On 2010-01-19, at 3:09 PM, Andreas Raab wrote: > John M McIntosh wrote: >> On 2010-01-19, at 2:30 PM, Andreas Raab wrote: >>> b) It trivially allows mmap()ing the entire image into memory (I think >>> John does that on the iPhone). >> So how is this different that what happens now? > > That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity. > > That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means: > >> #ifndef allocateMemoryMinimumImageFileHeaderSize > > Gets removed. > >> /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ >> #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize) > > Gets removed. > >> #ifndef sqImageFileReadEntireImage > > Gets removed. > >> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream) > > Gets removed. > >> In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize); > > Gets removed. > >> #ifdef BUILD_FOR_OSX >> size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); >> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) >> #else >> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif > > Gets removed. > > Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more. > > Cheers, > - Andreas -- =========================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com =========================================================================== |
John M McIntosh wrote: > For anonymous mmap I need > fread(imageFile, memStart, fsize(imageFile)); > to mmap allocate and read the file into the anonymous mmap > > readImageFromHeapSize does nothing. > > or for file based mmap I need > fread(imageFile, memStart, fsize(imageFile)); > to mmap the image file. > > So I need the readImageFromHeapSize for? To tell the VM where your memory is located (replacement of sqAlloc*), and how much you've pre-allocated. Like I wrote in the original post: "The VMs would generally use this along the lines of: memStart = malloc(heapSize); fread(imageFile, memStart, fsize(imageFile)); readImageFromHeapSize(memStart, heapSize); etc." Cheers, - Andreas > On 2010-01-19, at 3:09 PM, Andreas Raab wrote: > >> John M McIntosh wrote: >>> On 2010-01-19, at 2:30 PM, Andreas Raab wrote: >>>> b) It trivially allows mmap()ing the entire image into memory (I think >>>> John does that on the iPhone). >>> So how is this different that what happens now? >> That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity. >> >> That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means: >> >>> #ifndef allocateMemoryMinimumImageFileHeaderSize >> Gets removed. >> >>> /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */ >>> #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize) >> Gets removed. >> >>> #ifndef sqImageFileReadEntireImage >> Gets removed. >> >>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream) >> Gets removed. >> >>> In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize); >> Gets removed. >> >>> #ifdef BUILD_FOR_OSX >>> size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f); >>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) >>> #else >>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif >> Gets removed. >> >> Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more. >> >> Cheers, >> - Andreas > > -- > =========================================================================== > John M. McIntosh <[hidden email]> Twitter: squeaker68882 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > =========================================================================== > > > > |
In reply to this post by Andreas.Raab
On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: > > What I'm proposing is an additional VM entry point, called > > Interpreter>>readImageFromHeap: heapStart size: heapSize > > which takes to arguments: The pointer to the beginning of the > (pre-allocated) memory portion where the image file has been loaded or > mapped and a size for the allocated memory portion. The VM then assumes > that beginning at heapStart there is an image provided that it needs to > properly prepare (i.e., perform byte-reversal, pointer adjustment etc). A suggestion: Interpreter>>installImage: address size: imageSize header: header Where header is the address of the image file header, and address is the address of the uninitialized image beginning after the file header. Rationale: - "Install" is more appropriate that "read", since the contents of the image file have already been read. - "Heap" is not relevant. - The file header is logically distinct from from the contents of the image proper, and since someone else did the reading, it seems better to pass them as two distinct things (*). The pointer to header is presumed to be a memory block of at least 64 bytes, so in the usual case of a 32 bit image read into a block of malloc'ed space, we expect that address == header + 64. My opinion: Adding the entry point is easy to do and has a clear benefit for at least one platform of interest, so we should do this. Removing the existing entry point is debatable and can be done at a later time. Dave (*) I've noticed that the format of the file header for 64-bit images is not optimal, so a logical separation of the file header from the image proper just seems like the right thing to do. |
David T. Lewis wrote: > On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: >> What I'm proposing is an additional VM entry point, called >> >> Interpreter>>readImageFromHeap: heapStart size: heapSize >> >> which takes to arguments: The pointer to the beginning of the >> (pre-allocated) memory portion where the image file has been loaded or >> mapped and a size for the allocated memory portion. The VM then assumes >> that beginning at heapStart there is an image provided that it needs to >> properly prepare (i.e., perform byte-reversal, pointer adjustment etc). > > A suggestion: > > Interpreter>>installImage: address size: imageSize header: header I don't like that. It implies that the support code needs to be able to dissect the image file to find out how much is header and how much is data. That implies the support code needs to know about image format, byte reversal, and several other things. The interpreter already does that so why should we have to duplicate that in the support code again? > Rationale: > > - "Install" is more appropriate that "read", since the contents of the > image file have already been read. > > - "Heap" is not relevant. Fair enough. I don't really care what it's called. > - The file header is logically distinct from from the contents of the > image proper, and since someone else did the reading, it seems better > to pass them as two distinct things (*). The pointer to header is > presumed to be a memory block of at least 64 bytes, so in the usual > case of a 32 bit image read into a block of malloc'ed space, we expect > that address == header + 64. See above. You're making it artificially harder for the support code by requiring it to dissect the file and figure out what the header is. There is really no point to that. > My opinion: Adding the entry point is easy to do and has a clear benefit > for at least one platform of interest, so we should do this. Removing > the existing entry point is debatable and can be done at a later time. Why do you think removing the existing entry point is debatable? I can't see a long-term benefit for using readImageFromFile: given that the new entry point trivially subsumes the old one. Cheers, - Andreas |
In reply to this post by Andreas.Raab
2010/1/20 Andreas Raab <[hidden email]>: > > John M McIntosh wrote: >> >> For anonymous mmap I need fread(imageFile, memStart, fsize(imageFile)); >> to mmap allocate and read the file into the anonymous mmap >> >> readImageFromHeapSize does nothing. >> or for file based mmap I need fread(imageFile, memStart, >> fsize(imageFile)); >> to mmap the image file. >> So I need the readImageFromHeapSize for? > > To tell the VM where your memory is located (replacement of sqAlloc*), and > how much you've pre-allocated. Like I wrote in the original post: > > "The VMs would generally use this along the lines of: > > memStart = malloc(heapSize); > fread(imageFile, memStart, fsize(imageFile)); > readImageFromHeapSize(memStart, heapSize); > > etc." > It could be a memory location, it could be a socket stream, it could be anything. So, i very like your idea with replacing an entry point by abstract readImageFromObject: object size: heapSize which could be just a 2-liner: memStart := self sqMalloc: heapSize. self ioReadFrom: object Into: memStart Size: heapSize. (and object is of type 'void *', which leaves a platform code to decide its meaning) sqMalloc() and ioReadFromIntoSize(), also provided by platform-specific code).. This makes an interpreter completely agnostic from data source, where actual image is stored. > Cheers, > - Andreas > -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by Andreas.Raab
Ohh, even better. Platform code invoking: sqReadImageFrom(void *object). then, an interpreter reading a header and determining a heap size. (a question, though, does image size encoded in header?) once interpreter prepares a memory for image and decoding the header, it makes a second call to platform code to read the object memory, and we're done. No file access, no squeakFileOffsetType mess. Nothing. A platform code deciding these details by own! 2010/1/20 Andreas Raab <[hidden email]>: > > David T. Lewis wrote: >> >> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: >>> >>> What I'm proposing is an additional VM entry point, called >>> >>> Interpreter>>readImageFromHeap: heapStart size: heapSize >>> >>> which takes to arguments: The pointer to the beginning of the >>> (pre-allocated) memory portion where the image file has been loaded or >>> mapped and a size for the allocated memory portion. The VM then assumes >>> that beginning at heapStart there is an image provided that it needs to >>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc). >> >> A suggestion: >> >> Interpreter>>installImage: address size: imageSize header: header > > I don't like that. It implies that the support code needs to be able to > dissect the image file to find out how much is header and how much is data. > That implies the support code needs to know about image format, byte > reversal, and several other things. The interpreter already does that so why > should we have to duplicate that in the support code again? > >> Rationale: >> >> - "Install" is more appropriate that "read", since the contents of the >> image file have already been read. >> >> - "Heap" is not relevant. > > Fair enough. I don't really care what it's called. > >> - The file header is logically distinct from from the contents of the >> image proper, and since someone else did the reading, it seems better >> to pass them as two distinct things (*). The pointer to header is >> presumed to be a memory block of at least 64 bytes, so in the usual >> case of a 32 bit image read into a block of malloc'ed space, we expect >> that address == header + 64. > > See above. You're making it artificially harder for the support code by > requiring it to dissect the file and figure out what the header is. There is > really no point to that. > >> My opinion: Adding the entry point is easy to do and has a clear benefit >> for at least one platform of interest, so we should do this. Removing >> the existing entry point is debatable and can be done at a later time. > > Why do you think removing the existing entry point is debatable? I can't see > a long-term benefit for using readImageFromFile: given that the new entry > point trivially subsumes the old one. > > Cheers, > - Andreas > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote: > Platform code invoking: > > sqReadImageFrom(void *object). > > then, an interpreter reading a header and determining a heap size. > (a question, though, does image size encoded in header?) The image size is encoded in the file header. The heap size isn't. That's where the second argument came from that I provided. > once interpreter prepares a memory for image and decoding the header, > it makes a second call to platform code to read the object memory, and > we're done. Why even do that? The platform code can allocate and read the thing before it passes it on to the interpreter. Read my example again: main.c: memory = malloc(heapSize); fread(imageFile, memory, fsize(imageFile)); readImageFromHeapSize(memory, heapSize); interpret(); The platform code reads the allocates memory and reads the image. It can do this by reading a file or (in the Android case) by doing a series of JNI calls that pass uncompressed portions of a chunked image file from the app package (yes, it is complicated on Android). Or the support code could read it from a socket connection. Whatever. There is no reason for the interpreter to be involved *at all* in reading the image file. Cheers, - Andreas > No file access, no squeakFileOffsetType mess. Nothing. > A platform code deciding these details by own! > > 2010/1/20 Andreas Raab <[hidden email]>: >> David T. Lewis wrote: >>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: >>>> What I'm proposing is an additional VM entry point, called >>>> >>>> Interpreter>>readImageFromHeap: heapStart size: heapSize >>>> >>>> which takes to arguments: The pointer to the beginning of the >>>> (pre-allocated) memory portion where the image file has been loaded or >>>> mapped and a size for the allocated memory portion. The VM then assumes >>>> that beginning at heapStart there is an image provided that it needs to >>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc). >>> A suggestion: >>> >>> Interpreter>>installImage: address size: imageSize header: header >> I don't like that. It implies that the support code needs to be able to >> dissect the image file to find out how much is header and how much is data. >> That implies the support code needs to know about image format, byte >> reversal, and several other things. The interpreter already does that so why >> should we have to duplicate that in the support code again? >> >>> Rationale: >>> >>> - "Install" is more appropriate that "read", since the contents of the >>> image file have already been read. >>> >>> - "Heap" is not relevant. >> Fair enough. I don't really care what it's called. >> >>> - The file header is logically distinct from from the contents of the >>> image proper, and since someone else did the reading, it seems better >>> to pass them as two distinct things (*). The pointer to header is >>> presumed to be a memory block of at least 64 bytes, so in the usual >>> case of a 32 bit image read into a block of malloc'ed space, we expect >>> that address == header + 64. >> See above. You're making it artificially harder for the support code by >> requiring it to dissect the file and figure out what the header is. There is >> really no point to that. >> >>> My opinion: Adding the entry point is easy to do and has a clear benefit >>> for at least one platform of interest, so we should do this. Removing >>> the existing entry point is debatable and can be done at a later time. >> Why do you think removing the existing entry point is debatable? I can't see >> a long-term benefit for using readImageFromFile: given that the new entry >> point trivially subsumes the old one. >> >> Cheers, >> - Andreas >> > > > |
2010/1/20 Andreas Raab <[hidden email]>: > > Igor Stasenko wrote: >> >> Platform code invoking: >> >> sqReadImageFrom(void *object). >> >> then, an interpreter reading a header and determining a heap size. >> (a question, though, does image size encoded in header?) > > The image size is encoded in the file header. The heap size isn't. That's > where the second argument came from that I provided. > >> once interpreter prepares a memory for image and decoding the header, >> it makes a second call to platform code to read the object memory, and >> we're done. > > Why even do that? The platform code can allocate and read the thing before > it passes it on to the interpreter. Read my example again: > > main.c: > > memory = malloc(heapSize); > fread(imageFile, memory, fsize(imageFile)); > readImageFromHeapSize(memory, heapSize); > interpret(); > > The platform code reads the allocates memory and reads the image. It can do > this by reading a file or (in the Android case) by doing a series of JNI > calls that pass uncompressed portions of a chunked image file from the app > package (yes, it is complicated on Android). Or the support code could read > it from a socket connection. Whatever. There is no reason for the > interpreter to be involved *at all* in reading the image file. > allocating enough space + breathing room for object memory. So, i thought that its better when interpreter controls the memory allocation size, not platform code alone. > Cheers, > - Andreas > >> No file access, no squeakFileOffsetType mess. Nothing. >> A platform code deciding these details by own! >> >> 2010/1/20 Andreas Raab <[hidden email]>: >>> >>> David T. Lewis wrote: >>>> >>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: >>>>> >>>>> What I'm proposing is an additional VM entry point, called >>>>> >>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize >>>>> >>>>> which takes to arguments: The pointer to the beginning of the >>>>> (pre-allocated) memory portion where the image file has been loaded or >>>>> mapped and a size for the allocated memory portion. The VM then assumes >>>>> that beginning at heapStart there is an image provided that it needs to >>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc). >>>> >>>> A suggestion: >>>> >>>> Interpreter>>installImage: address size: imageSize header: header >>> >>> I don't like that. It implies that the support code needs to be able to >>> dissect the image file to find out how much is header and how much is >>> data. >>> That implies the support code needs to know about image format, byte >>> reversal, and several other things. The interpreter already does that so >>> why >>> should we have to duplicate that in the support code again? >>> >>>> Rationale: >>>> >>>> - "Install" is more appropriate that "read", since the contents of the >>>> image file have already been read. >>>> >>>> - "Heap" is not relevant. >>> >>> Fair enough. I don't really care what it's called. >>> >>>> - The file header is logically distinct from from the contents of the >>>> image proper, and since someone else did the reading, it seems better >>>> to pass them as two distinct things (*). The pointer to header is >>>> presumed to be a memory block of at least 64 bytes, so in the usual >>>> case of a 32 bit image read into a block of malloc'ed space, we expect >>>> that address == header + 64. >>> >>> See above. You're making it artificially harder for the support code by >>> requiring it to dissect the file and figure out what the header is. There >>> is >>> really no point to that. >>> >>>> My opinion: Adding the entry point is easy to do and has a clear benefit >>>> for at least one platform of interest, so we should do this. Removing >>>> the existing entry point is debatable and can be done at a later time. >>> >>> Why do you think removing the existing entry point is debatable? I can't >>> see >>> a long-term benefit for using readImageFromFile: given that the new entry >>> point trivially subsumes the old one. >>> >>> Cheers, >>> - Andreas >>> >> >> >> > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote: > 2010/1/20 Andreas Raab <[hidden email]>: >> The platform code reads the allocates memory and reads the image. It can do >> this by reading a file or (in the Android case) by doing a series of JNI >> calls that pass uncompressed portions of a chunked image file from the app >> package (yes, it is complicated on Android). Or the support code could read >> it from a socket connection. Whatever. There is no reason for the >> interpreter to be involved *at all* in reading the image file. >> > Well, the current code implies some logic when decoding a header, like > allocating enough space + breathing room for object memory. That's a problem the support code already deals with. The support code already knows the limits on memory that the user specified (via -memory argument or implicitly via mmap() or similar means) and it also already needs to check that the image can actually fit into the heap (or else it'll blow up when trying to read it). So it's similarly trivial for the support code to say: if(heapSize < fsize(imageFile) + headRoom) abort("Not enough memory to load image"); > So, i thought that its better when interpreter controls the memory > allocation size, not platform code alone. The platform already completely controls the allocation. It's only that we allow the interpreter to make the first request and that request is based on what we've passed in as desiredHeapSize from the support code. Lots of useless complexity, the support code might as well just allocate it without involving the interpreter. Cheers, - Andreas > >> Cheers, >> - Andreas >> >>> No file access, no squeakFileOffsetType mess. Nothing. >>> A platform code deciding these details by own! >>> >>> 2010/1/20 Andreas Raab <[hidden email]>: >>>> David T. Lewis wrote: >>>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: >>>>>> What I'm proposing is an additional VM entry point, called >>>>>> >>>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize >>>>>> >>>>>> which takes to arguments: The pointer to the beginning of the >>>>>> (pre-allocated) memory portion where the image file has been loaded or >>>>>> mapped and a size for the allocated memory portion. The VM then assumes >>>>>> that beginning at heapStart there is an image provided that it needs to >>>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc). >>>>> A suggestion: >>>>> >>>>> Interpreter>>installImage: address size: imageSize header: header >>>> I don't like that. It implies that the support code needs to be able to >>>> dissect the image file to find out how much is header and how much is >>>> data. >>>> That implies the support code needs to know about image format, byte >>>> reversal, and several other things. The interpreter already does that so >>>> why >>>> should we have to duplicate that in the support code again? >>>> >>>>> Rationale: >>>>> >>>>> - "Install" is more appropriate that "read", since the contents of the >>>>> image file have already been read. >>>>> >>>>> - "Heap" is not relevant. >>>> Fair enough. I don't really care what it's called. >>>> >>>>> - The file header is logically distinct from from the contents of the >>>>> image proper, and since someone else did the reading, it seems better >>>>> to pass them as two distinct things (*). The pointer to header is >>>>> presumed to be a memory block of at least 64 bytes, so in the usual >>>>> case of a 32 bit image read into a block of malloc'ed space, we expect >>>>> that address == header + 64. >>>> See above. You're making it artificially harder for the support code by >>>> requiring it to dissect the file and figure out what the header is. There >>>> is >>>> really no point to that. >>>> >>>>> My opinion: Adding the entry point is easy to do and has a clear benefit >>>>> for at least one platform of interest, so we should do this. Removing >>>>> the existing entry point is debatable and can be done at a later time. >>>> Why do you think removing the existing entry point is debatable? I can't >>>> see >>>> a long-term benefit for using readImageFromFile: given that the new entry >>>> point trivially subsumes the old one. >>>> >>>> Cheers, >>>> - Andreas >>>> >>> >>> > > > |
In reply to this post by Andreas.Raab
On Tue, Jan 19, 2010 at 05:34:25PM -0800, Andreas Raab wrote: > > David T. Lewis wrote: > > > >A suggestion: > > > > Interpreter>>installImage: address size: imageSize header: header > > I don't like that. It implies that the support code needs to be able to > dissect the image file to find out how much is header and how much is > data. That implies the support code needs to know about image format, > byte reversal, and several other things. The interpreter already does > that so why should we have to duplicate that in the support code again? OK > >My opinion: Adding the entry point is easy to do and has a clear benefit > >for at least one platform of interest, so we should do this. Removing > >the existing entry point is debatable and can be done at a later time. > > Why do you think removing the existing entry point is debatable? I can't > see a long-term benefit for using readImageFromFile: given that the new > entry point trivially subsumes the old one. I don't mean to debate it ;) My point is that it can be handled later when everyone has agreed and the necessary action has been taken to implement it. This may or may not happen quickly; meanwhile adding the new entry point is straightforward. Dave |
In reply to this post by Igor Stasenko
On 2010-01-19, at 6:04 PM, Igor Stasenko wrote: > > 2010/1/20 Andreas Raab <[hidden email]>: >> >> Igor Stasenko wrote: >>> >>> Platform code invoking: >>> >>> sqReadImageFrom(void *object). > Ok, the other thing that goes on here is that I mmap the start of the bytecodes to 500MB on os-x and the iPhone usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize) the headersize that's passed in helps me to adjust where the bytecode start is This is done so that I can avoid swizzling all pointers between restarts on the machines, on os-x and the iPhone byte codes start at 500MB... Note this becomes more interesting if you can avoid swizzling all the pointers at startup time on FLASH based devices so you don't load the entire image into RAM. In my iPhone work for wikiserver the image is 1932 pages, but I only need to load 806 pages into ram to get the first page up. {Yes of course I had to remove the crappy allInstances in various places at startup time}. If you check the pointer swizzle doesn't do anything if the old start of memory offset matchs the new start of memory offset. BTW Loading the extra 1100 pages actually *is* measured in seconds, so the gain is important. -- =========================================================================== John M. McIntosh <[hidden email]> Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com =========================================================================== |
In reply to this post by David T. Lewis
David T. Lewis wrote: >>> My opinion: Adding the entry point is easy to do and has a clear benefit >>> for at least one platform of interest, so we should do this. Removing >>> the existing entry point is debatable and can be done at a later time. >> Why do you think removing the existing entry point is debatable? I can't >> see a long-term benefit for using readImageFromFile: given that the new >> entry point trivially subsumes the old one. > > I don't mean to debate it ;) My point is that it can be handled later > when everyone has agreed and the necessary action has been taken to > implement it. This may or may not happen quickly; meanwhile adding the > new entry point is straightforward. Ah. Agreed :-) Cheers, - Andreas |
In reply to this post by Andreas.Raab
2010/1/20 Andreas Raab <[hidden email]>: > > Igor Stasenko wrote: >> >> 2010/1/20 Andreas Raab <[hidden email]>: >>> >>> The platform code reads the allocates memory and reads the image. It can >>> do >>> this by reading a file or (in the Android case) by doing a series of JNI >>> calls that pass uncompressed portions of a chunked image file from the >>> app >>> package (yes, it is complicated on Android). Or the support code could >>> read >>> it from a socket connection. Whatever. There is no reason for the >>> interpreter to be involved *at all* in reading the image file. >>> >> Well, the current code implies some logic when decoding a header, like >> allocating enough space + breathing room for object memory. > > That's a problem the support code already deals with. The support code > already knows the limits on memory that the user specified (via -memory > argument or implicitly via mmap() or similar means) and it also already > needs to check that the image can actually fit into the heap (or else it'll > blow up when trying to read it). So it's similarly trivial for the support > code to say: > > if(heapSize < fsize(imageFile) + headRoom) > abort("Not enough memory to load image"); > >> So, i thought that its better when interpreter controls the memory >> allocation size, not platform code alone. > > The platform already completely controls the allocation. It's only that we > allow the interpreter to make the first request and that request is based on > what we've passed in as desiredHeapSize from the support code. Lots of > useless complexity, the support code might as well just allocate it without > involving the interpreter. > Support code may not know the image size beforehead. Or image file could be combined with something else into a single file , which makes it much larger than actually need for image. You'll never know how much space you need before loading a header and determining the object memory size. The desizered heap size could be actually a maximum heap size available to allocate but not optimal nor minimal. That's why i think its worth to leave allocation at VM side, not in support code, as long as you leaving header reading there. > Cheers, > - Andreas > >> >>> Cheers, >>> - Andreas >>> >>>> No file access, no squeakFileOffsetType mess. Nothing. >>>> A platform code deciding these details by own! >>>> >>>> 2010/1/20 Andreas Raab <[hidden email]>: >>>>> >>>>> David T. Lewis wrote: >>>>>> >>>>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote: >>>>>>> >>>>>>> What I'm proposing is an additional VM entry point, called >>>>>>> >>>>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize >>>>>>> >>>>>>> which takes to arguments: The pointer to the beginning of the >>>>>>> (pre-allocated) memory portion where the image file has been loaded >>>>>>> or >>>>>>> mapped and a size for the allocated memory portion. The VM then >>>>>>> assumes >>>>>>> that beginning at heapStart there is an image provided that it needs >>>>>>> to >>>>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment >>>>>>> etc). >>>>>> >>>>>> A suggestion: >>>>>> >>>>>> Interpreter>>installImage: address size: imageSize header: header >>>>> >>>>> I don't like that. It implies that the support code needs to be able to >>>>> dissect the image file to find out how much is header and how much is >>>>> data. >>>>> That implies the support code needs to know about image format, byte >>>>> reversal, and several other things. The interpreter already does that >>>>> so >>>>> why >>>>> should we have to duplicate that in the support code again? >>>>> >>>>>> Rationale: >>>>>> >>>>>> - "Install" is more appropriate that "read", since the contents of the >>>>>> image file have already been read. >>>>>> >>>>>> - "Heap" is not relevant. >>>>> >>>>> Fair enough. I don't really care what it's called. >>>>> >>>>>> - The file header is logically distinct from from the contents of the >>>>>> image proper, and since someone else did the reading, it seems better >>>>>> to pass them as two distinct things (*). The pointer to header is >>>>>> presumed to be a memory block of at least 64 bytes, so in the usual >>>>>> case of a 32 bit image read into a block of malloc'ed space, we >>>>>> expect >>>>>> that address == header + 64. >>>>> >>>>> See above. You're making it artificially harder for the support code by >>>>> requiring it to dissect the file and figure out what the header is. >>>>> There >>>>> is >>>>> really no point to that. >>>>> >>>>>> My opinion: Adding the entry point is easy to do and has a clear >>>>>> benefit >>>>>> for at least one platform of interest, so we should do this. Removing >>>>>> the existing entry point is debatable and can be done at a later time. >>>>> >>>>> Why do you think removing the existing entry point is debatable? I >>>>> can't >>>>> see >>>>> a long-term benefit for using readImageFromFile: given that the new >>>>> entry >>>>> point trivially subsumes the old one. >>>>> >>>>> Cheers, >>>>> - Andreas >>>>> >>>> >>>> >> >> >> > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote: >> The platform already completely controls the allocation. It's only that we >> allow the interpreter to make the first request and that request is based on >> what we've passed in as desiredHeapSize from the support code. Lots of >> useless complexity, the support code might as well just allocate it without >> involving the interpreter. >> > > Support code may not know the image size beforehead. Yes it does. The support code does the reading after all. Since an entire image file needs to be read before it can be interpreted, there is no way that the support code wouldn't know its size. Even if you ship it over the wire (say in a http request) all protocols that I'm aware about have information that encode the size of its payload (Content-Length in http). The fact that the dataLength is also part of the header should be seen as a duplication of information that can be used by the interpreter to verify the integrity of the image. It doesn't mean that this will be the only source of information about the size of the image. > Or image file could be combined with something else into a single file > , which makes it much larger than actually need for image. If you need that, you would implement the means to recognize and reconcile the situation appropriately. For example, on Android I'm not loading the entire package into memory. I'm extracting portions of the zip file and load them into proper locations. I'm doing this because I'm in a non-standard situation that requires adjusting for its tradeoffs. However, we know that the "standard situation" is one in which loading the entire image file is entirely appropriate. Moreoever, we know that regardless of the method of I/O we're using we always need to produce a complete image file first. Since we know that, and since the method of I/O will differ depending on the context in which your VM runs in, it is not meaningful to worry about situations that you describe - nobody would load more than they'd have to unless that's considered a reasonable tradeoff (there may be situations where this is actually advantageous). There is really nothing to worry about here. The standard situation is that we can read an entire image file. The non-standard situations have to be dealt with on a case-by-case basis but in any case they'll have a means of identifying the image, loading into memory and providing it to the interpreter. There is no reason to worry about someone writing the support code just randomly passing excess bytes to the interpreter. Cheers, - Andreas |
2010/1/20 Andreas Raab <[hidden email]>: > > Igor Stasenko wrote: >>> >>> The platform already completely controls the allocation. It's only that >>> we >>> allow the interpreter to make the first request and that request is based >>> on >>> what we've passed in as desiredHeapSize from the support code. Lots of >>> useless complexity, the support code might as well just allocate it >>> without >>> involving the interpreter. >>> >> >> Support code may not know the image size beforehead. > > Yes it does. The support code does the reading after all. Since an entire > image file needs to be read before it can be interpreted, there is no way > that the support code wouldn't know its size. Even if you ship it over the > wire (say in a http request) all protocols that I'm aware about have > information that encode the size of its payload (Content-Length in http). > The fact that the dataLength is also part of the header should be seen as a > duplication of information that can be used by the interpreter to verify the > integrity of the image. It doesn't mean that this will be the only source of > information about the size of the image. > >> Or image file could be combined with something else into a single file >> , which makes it much larger than actually need for image. > > If you need that, you would implement the means to recognize and reconcile > the situation appropriately. For example, on Android I'm not loading the > entire package into memory. I'm extracting portions of the zip file and load > them into proper locations. I'm doing this because I'm in a non-standard > situation that requires adjusting for its tradeoffs. > > However, we know that the "standard situation" is one in which loading the > entire image file is entirely appropriate. Moreoever, we know that > regardless of the method of I/O we're using we always need to produce a > complete image file first. Since we know that, and since the method of I/O > will differ depending on the context in which your VM runs in, it is not > meaningful to worry about situations that you describe - nobody would load > more than they'd have to unless that's considered a reasonable tradeoff > (there may be situations where this is actually advantageous). > > There is really nothing to worry about here. The standard situation is that > we can read an entire image file. The non-standard situations have to be > dealt with on a case-by-case basis but in any case they'll have a means of > identifying the image, loading into memory and providing it to the > interpreter. There is no reason to worry about someone writing the support > code just randomly passing excess bytes to the interpreter. > Well, i mainly meant some kind of a network startup scenario, when image data is feeded directly from socket. Platform code could pass the socket to interpreter (as void* object), and then wait for allocation & read requests which will follow. And so, you don't have to use some intermediate memory buffer to preload an image and only then pass it to interpreter entry point. > Cheers, > - Andreas > -- Best regards, Igor Stasenko AKA sig. |
Igor Stasenko wrote: > Well, i mainly meant some kind of a network startup scenario, when > image data is feeded directly from socket. > Platform code could pass the socket to interpreter (as void* object), > and then wait for allocation & read requests which will follow. > And so, you don't have to use some intermediate memory buffer to > preload an image and only then pass > it to interpreter entry point. But Igor, unless you are *trying* to be obscure why would your network protocol (that the support code needs to implement anyway) not include any information about how many bytes are to come? See my previous example about http - you could trivially launch an image by giving it an HTTP url; download the image into memory and launch the interpreter without having to involve the interpreter at all. This is *necessarily* true for *all* I/O operations that load an image. There simply is no point to pass a void* to the interpreter only have it passed back as a void* to the support code. If you're doing that, why not load the thing right away instead of unnecessarily passing pointers back and forth? Cheers, - Andreas |
2010/1/20 Andreas Raab <[hidden email]>: > > Igor Stasenko wrote: >> >> Well, i mainly meant some kind of a network startup scenario, when >> image data is feeded directly from socket. >> Platform code could pass the socket to interpreter (as void* object), >> and then wait for allocation & read requests which will follow. >> And so, you don't have to use some intermediate memory buffer to >> preload an image and only then pass >> it to interpreter entry point. > > But Igor, unless you are *trying* to be obscure why would your network > protocol (that the support code needs to implement anyway) not include any > information about how many bytes are to come? See my previous example about > http - you could trivially launch an image by giving it an HTTP url; > download the image into memory and launch the interpreter without having to > involve the interpreter at all. > > This is *necessarily* true for *all* I/O operations that load an image. > There simply is no point to pass a void* to the interpreter only have it > passed back as a void* to the support code. If you're doing that, why not > load the thing right away instead of unnecessarily passing pointers back and > forth? > in interpreter. Why one should wait , downloading the whole image, only after that to discover that image format is not compatible with interpreter, or not enough memory etc etc? > Cheers, > - Andreas > -- Best regards, Igor Stasenko AKA sig. |
Free forum by Nabble | Edit this page |