Smalltalk › Squeak › Squeak VM

[Fwd: Image file loading]

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

27 messages Options

Andreas.Raab

[Fwd: Image file loading]

Apologies, accidentally posted to squeak-dev instead of vm-dev.

- A.

-------- Original Message --------
Subject: Image file loading
Date: Tue, 19 Jan 2010 14:06:00 -0800
From: Andreas Raab <[hidden email]>
Reply-To: The general-purpose Squeak developers list
<[hidden email]>
To: The general-purpose Squeak developers list
<[hidden email]>
Newsgroups: gmane.comp.lang.smalltalk.squeak.general

Folks -

I think that one of the more clever hacks I did for the Android VM
deserves proper generalization in the Interpreter. Since on Android the
image is part of the (compressed) application package it's not easily
possible to pass a file handle into the VM and have the VM read the
image as an external file.

Instead, I'm preloading the image file into memory and then hacked an
implementation of the sqImageFile* and sqAlloc* functions that would
basically operate on the image in memory. I think we should formalize
this idea for two reasons:

a) It allows preloading the image into memory on platforms where the
image may not be in an external file.

b) It trivially allows mmap()ing the entire image into memory (I think
John does that on the iPhone).

What I'm proposing is an additional VM entry point, called

Interpreter>>readImageFromHeap: heapStart size: heapSize

which takes to arguments: The pointer to the beginning of the
(pre-allocated) memory portion where the image file has been loaded or
mapped and a size for the allocated memory portion. The VM then assumes
that beginning at heapStart there is an image provided that it needs to
properly prepare (i.e., perform byte-reversal, pointer adjustment etc).

The VMs would generally use this along the lines of:

memStart = self malloc(heapSize);
fread(imageFile, memStart, fsize(imageFile));
readImageFromHeapSize(memStart, heapSize);

and of course a backwards compatible version of
readImageFromFile:HeapSize:StartingAt: could do the same (although I'd
rather drop that function altogether since it removes the need for
supporting various sqImageFile* functions).

I'd also say that the same pattern should be used for *writing* image
files, that is we assemble the entire image in memory then call
sqImageFileWrite(fileName, heapStart, heapSize) and sqImageFileWrite is
really just a combo of fopen(); fwrite(); fclose();

I think this scheme is much nicer, less complicated and offers more
options for implementing image loading than what we currently have.

Comments?

Cheers,
- Andreas

johnmci

Re: [Fwd: Image file loading]

On 2010-01-19, at 2:30 PM, Andreas Raab wrote:

> b) It trivially allows mmap()ing the entire image into memory (I think
> John does that on the iPhone).

So how is this different that what happens now?

In interp.h we have:

#ifndef allocateMemoryMinimumImageFileHeaderSize
/* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */
#define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)
#endif

#ifndef sqImageFileReadEntireImage
/* Called by Interpreter>>sqImage:read:size:length: */
#define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)
#endif

In platform specfic I change that

usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);
#define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemoryMac(heapSize, minimumMemory, fileStream, headerSize)

#ifdef BUILD_FOR_OSX
size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f);
#define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream)
#else
#define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length
#endif

The interp.c slang generated below on os-x the sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream)
does read the data from fileStream for the given length into memoryAddress. But on the iPhone we just return the length as per the #define above
since the allocateMemoryMinimumImageFileHeaderSize has mmapped the file.

slang...

memory = allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, f, headerSize);
if (memory == null) {
insufficientMemoryAvailableError();
}
...
sqImageFileSeek(f, headerStart + headerSize);
/* begin sqImage:read:size:length: */
memoryAddress = pointerForOop(memory);
bytesRead = sqImageFileReadEntireImage(memoryAddress, sizeof(unsigned char), dataSize, f);
if (bytesRead != dataSize) {
unableToReadImageError();
}
--
===========================================================================
John M. McIntosh <[hidden email]> Twitter: squeaker68882
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
===========================================================================

Andreas.Raab

Re: [Fwd: Image file loading]

John M McIntosh wrote:
> On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
>
>> b) It trivially allows mmap()ing the entire image into memory (I think
>> John does that on the iPhone).
>
> So how is this different that what happens now?

That it removes hundreds of lines of completely unnecessary code. Yes,
one can implement the same logic with the current interfaces, I just did
that on Android :-) But at some point one must question the approach if
the end result is only adding additional complexity.

That's why I'm saying move all of this out so that the interpreter
doesn't have to deal with it. The task of allocating the initial amount
of memory and loading the image into said memory should be done by
support code *before* calling the interpreter the first time. Then, give
the interpreter one entry point and then remove all the extra
complexity. Which means:

> #ifndef allocateMemoryMinimumImageFileHeaderSize

Gets removed.

> /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */
> #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)

Gets removed.

> #ifndef sqImageFileReadEntireImage

Gets removed.

> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)

Gets removed.

> In platform specfic I change that
>
> usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);

Gets removed.

> #ifdef BUILD_FOR_OSX
> size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f);
> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream)
> #else
> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length
> #endif

Gets removed.

Some more things that get removed is in
CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile*
definitions in sq.h and more.

Cheers,
- Andreas

johnmci

Re: [Fwd: Image file loading]

For anonymous mmap I need
fread(imageFile, memStart, fsize(imageFile));
to mmap allocate and read the file into the anonymous mmap

readImageFromHeapSize
does nothing.

or for file based mmap I need
fread(imageFile, memStart, fsize(imageFile));
to mmap the image file.

So I need the readImageFromHeapSize for?

On 2010-01-19, at 3:09 PM, Andreas Raab wrote:

> John M McIntosh wrote:
>> On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
>>> b) It trivially allows mmap()ing the entire image into memory (I think
>>> John does that on the iPhone).
>> So how is this different that what happens now?
>
> That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity.
>
> That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means:
>
>> #ifndef allocateMemoryMinimumImageFileHeaderSize
>
> Gets removed.
>
>> /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */
>> #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)
>
> Gets removed.
>
>> #ifndef sqImageFileReadEntireImage
>
> Gets removed.
>
>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)
>
> Gets removed.
>
>> In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);
>
> Gets removed.
>
>> #ifdef BUILD_FOR_OSX
>> size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f);
>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream)
>> #else
>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif
>
> Gets removed.
>
> Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more.
>
> Cheers,
> - Andreas

--
===========================================================================
John M. McIntosh <[hidden email]> Twitter: squeaker68882
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
===========================================================================

Andreas.Raab

Re: [Fwd: Image file loading]

John M McIntosh wrote:

> For anonymous mmap I need
> fread(imageFile, memStart, fsize(imageFile));
> to mmap allocate and read the file into the anonymous mmap
>
> readImageFromHeapSize does nothing.
>
> or for file based mmap I need
> fread(imageFile, memStart, fsize(imageFile));
> to mmap the image file.
>
> So I need the readImageFromHeapSize for?

To tell the VM where your memory is located (replacement of sqAlloc*),
and how much you've pre-allocated. Like I wrote in the original post:

"The VMs would generally use this along the lines of:

memStart = malloc(heapSize);
fread(imageFile, memStart, fsize(imageFile));
readImageFromHeapSize(memStart, heapSize);

etc."

Cheers,
- Andreas

> On 2010-01-19, at 3:09 PM, Andreas Raab wrote:
>
>> John M McIntosh wrote:
>>> On 2010-01-19, at 2:30 PM, Andreas Raab wrote:
>>>> b) It trivially allows mmap()ing the entire image into memory (I think
>>>> John does that on the iPhone).
>>> So how is this different that what happens now?
>> That it removes hundreds of lines of completely unnecessary code. Yes, one can implement the same logic with the current interfaces, I just did that on Android :-) But at some point one must question the approach if the end result is only adding additional complexity.
>>
>> That's why I'm saying move all of this out so that the interpreter doesn't have to deal with it. The task of allocating the initial amount of memory and loading the image into said memory should be done by support code *before* calling the interpreter the first time. Then, give the interpreter one entry point and then remove all the extra complexity. Which means:
>>
>>> #ifndef allocateMemoryMinimumImageFileHeaderSize
>> Gets removed.
>>
>>> /* Called by Interpreter>>allocateMemory:minimum:imageFile:headerSize: */
>>> #define allocateMemoryMinimumImageFileHeaderSize(heapSize, minimumMemory, fileStream, headerSize) sqAllocateMemory(minimumMemory, heapSize)
>> Gets removed.
>>
>>> #ifndef sqImageFileReadEntireImage
>> Gets removed.
>>
>>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileRead(memoryAddress, elementSize, length, fileStream)
>> Gets removed.
>>
>>> In platform specfic I change that usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize);
>> Gets removed.
>>
>>> #ifdef BUILD_FOR_OSX
>>> size_t sqImageFileReadEntireImage(void *ptr, size_t elementSize, size_t count, FILE * f);
>>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream)
>>> #else
>>> #define sqImageFileReadEntireImage(memoryAddress, elementSize, length, fileStream) length #endif
>> Gets removed.
>>
>> Some more things that get removed is in CCodeGenerator>>writeDefaultMacrosOn: a whole bunch of sqImageFile* definitions in sq.h and more.
>>
>> Cheers,
>> - Andreas
>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]> Twitter: squeaker68882
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ===========================================================================
>
>
>
>

David T. Lewis

Re: [Fwd: Image file loading]

In reply to this post by Andreas.Raab

On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:

>
> What I'm proposing is an additional VM entry point, called
>
> Interpreter>>readImageFromHeap: heapStart size: heapSize
>
> which takes to arguments: The pointer to the beginning of the
> (pre-allocated) memory portion where the image file has been loaded or
> mapped and a size for the allocated memory portion. The VM then assumes
> that beginning at heapStart there is an image provided that it needs to
> properly prepare (i.e., perform byte-reversal, pointer adjustment etc).

A suggestion:

Interpreter>>installImage: address size: imageSize header: header

Where header is the address of the image file header, and address is
the address of the uninitialized image beginning after the file header.

Rationale:

- "Install" is more appropriate that "read", since the contents of the
image file have already been read.

- "Heap" is not relevant.

- The file header is logically distinct from from the contents of the
image proper, and since someone else did the reading, it seems better
to pass them as two distinct things (*). The pointer to header is
presumed to be a memory block of at least 64 bytes, so in the usual
case of a 32 bit image read into a block of malloc'ed space, we expect
that address == header + 64.

My opinion: Adding the entry point is easy to do and has a clear benefit
for at least one platform of interest, so we should do this. Removing
the existing entry point is debatable and can be done at a later time.

Dave

(*) I've noticed that the format of the file header for 64-bit images
is not optimal, so a logical separation of the file header from the
image proper just seems like the right thing to do.

Andreas.Raab

Re: [Fwd: Image file loading]

David T. Lewis wrote:

> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
>> What I'm proposing is an additional VM entry point, called
>>
>> Interpreter>>readImageFromHeap: heapStart size: heapSize
>>
>> which takes to arguments: The pointer to the beginning of the
>> (pre-allocated) memory portion where the image file has been loaded or
>> mapped and a size for the allocated memory portion. The VM then assumes
>> that beginning at heapStart there is an image provided that it needs to
>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
>
> A suggestion:
>
> Interpreter>>installImage: address size: imageSize header: header

I don't like that. It implies that the support code needs to be able to
dissect the image file to find out how much is header and how much is
data. That implies the support code needs to know about image format,
byte reversal, and several other things. The interpreter already does
that so why should we have to duplicate that in the support code again?

> Rationale:
>
> - "Install" is more appropriate that "read", since the contents of the
> image file have already been read.
>
> - "Heap" is not relevant.

Fair enough. I don't really care what it's called.

> - The file header is logically distinct from from the contents of the
> image proper, and since someone else did the reading, it seems better
> to pass them as two distinct things (*). The pointer to header is
> presumed to be a memory block of at least 64 bytes, so in the usual
> case of a 32 bit image read into a block of malloc'ed space, we expect
> that address == header + 64.

See above. You're making it artificially harder for the support code by
requiring it to dissect the file and figure out what the header is.
There is really no point to that.

> My opinion: Adding the entry point is easy to do and has a clear benefit
> for at least one platform of interest, so we should do this. Removing
> the existing entry point is debatable and can be done at a later time.

Why do you think removing the existing entry point is debatable? I can't
see a long-term benefit for using readImageFromFile: given that the new
entry point trivially subsumes the old one.

Cheers,
- Andreas

Igor Stasenko

Re: [Fwd: Image file loading]

In reply to this post by Andreas.Raab

2010/1/20 Andreas Raab <[hidden email]>:

>
> John M McIntosh wrote:
>>
>> For anonymous mmap I need fread(imageFile, memStart, fsize(imageFile));
>> to mmap allocate and read the file into the anonymous mmap
>>
>> readImageFromHeapSize does nothing.
>> or for file based mmap I need fread(imageFile, memStart,
>> fsize(imageFile));
>> to mmap the image file.
>> So I need the readImageFromHeapSize for?
>
> To tell the VM where your memory is located (replacement of sqAlloc*), and
> how much you've pre-allocated. Like I wrote in the original post:
>
> "The VMs would generally use this along the lines of:
>
> memStart = malloc(heapSize);
> fread(imageFile, memStart, fsize(imageFile));
> readImageFromHeapSize(memStart, heapSize);
>
> etc."
>

RIGHT! Who said that image could be read only from file?
It could be a memory location, it could be a socket stream, it could
be anything.

So, i very like your idea with replacing an entry point by abstract

readImageFromObject: object size: heapSize

which could be just a 2-liner:

memStart := self sqMalloc: heapSize.
self ioReadFrom: object Into: memStart Size: heapSize.

(and object is of type 'void *', which leaves a platform code to
decide its meaning)

sqMalloc() and ioReadFromIntoSize(), also provided by platform-specific code)..

This makes an interpreter completely agnostic from data source, where
actual image is stored.

> Cheers,
> - Andreas
>

--
Best regards,
Igor Stasenko AKA sig.

Igor Stasenko

Re: [Fwd: Image file loading]

In reply to this post by Andreas.Raab

Ohh, even better.

Platform code invoking:

sqReadImageFrom(void *object).

then, an interpreter reading a header and determining a heap size.
(a question, though, does image size encoded in header?)

once interpreter prepares a memory for image and decoding the header,
it makes a second call to platform code to read the object memory, and
we're done.

No file access, no squeakFileOffsetType mess. Nothing.
A platform code deciding these details by own!

2010/1/20 Andreas Raab <[hidden email]>:

>
> David T. Lewis wrote:
>>
>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
>>>
>>> What I'm proposing is an additional VM entry point, called
>>>
>>> Interpreter>>readImageFromHeap: heapStart size: heapSize
>>>
>>> which takes to arguments: The pointer to the beginning of the
>>> (pre-allocated) memory portion where the image file has been loaded or
>>> mapped and a size for the allocated memory portion. The VM then assumes
>>> that beginning at heapStart there is an image provided that it needs to
>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
>>
>> A suggestion:
>>
>> Interpreter>>installImage: address size: imageSize header: header
>
> I don't like that. It implies that the support code needs to be able to
> dissect the image file to find out how much is header and how much is data.
> That implies the support code needs to know about image format, byte
> reversal, and several other things. The interpreter already does that so why
> should we have to duplicate that in the support code again?
>
>> Rationale:
>>
>> - "Install" is more appropriate that "read", since the contents of the
>> image file have already been read.
>>
>> - "Heap" is not relevant.
>
> Fair enough. I don't really care what it's called.
>
>> - The file header is logically distinct from from the contents of the
>> image proper, and since someone else did the reading, it seems better
>> to pass them as two distinct things (*). The pointer to header is
>> presumed to be a memory block of at least 64 bytes, so in the usual
>> case of a 32 bit image read into a block of malloc'ed space, we expect
>> that address == header + 64.
>
> See above. You're making it artificially harder for the support code by
> requiring it to dissect the file and figure out what the header is. There is
> really no point to that.
>
>> My opinion: Adding the entry point is easy to do and has a clear benefit
>> for at least one platform of interest, so we should do this. Removing
>> the existing entry point is debatable and can be done at a later time.
>
> Why do you think removing the existing entry point is debatable? I can't see
> a long-term benefit for using readImageFromFile: given that the new entry
> point trivially subsumes the old one.
>
> Cheers,
> - Andreas
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

Re: [Vm-dev] [Fwd: Image file loading]

Igor Stasenko wrote:
> Platform code invoking:
>
> sqReadImageFrom(void *object).
>
> then, an interpreter reading a header and determining a heap size.
> (a question, though, does image size encoded in header?)

The image size is encoded in the file header. The heap size isn't.
That's where the second argument came from that I provided.

> once interpreter prepares a memory for image and decoding the header,
> it makes a second call to platform code to read the object memory, and
> we're done.

Why even do that? The platform code can allocate and read the thing
before it passes it on to the interpreter. Read my example again:

main.c:

memory = malloc(heapSize);
fread(imageFile, memory, fsize(imageFile));
readImageFromHeapSize(memory, heapSize);
interpret();

The platform code reads the allocates memory and reads the image. It can
do this by reading a file or (in the Android case) by doing a series of
JNI calls that pass uncompressed portions of a chunked image file from
the app package (yes, it is complicated on Android). Or the support code
could read it from a socket connection. Whatever. There is no reason for
the interpreter to be involved *at all* in reading the image file.

Cheers,
- Andreas

> No file access, no squeakFileOffsetType mess. Nothing.
> A platform code deciding these details by own!
>
> 2010/1/20 Andreas Raab <[hidden email]>:
>> David T. Lewis wrote:
>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
>>>> What I'm proposing is an additional VM entry point, called
>>>>
>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize
>>>>
>>>> which takes to arguments: The pointer to the beginning of the
>>>> (pre-allocated) memory portion where the image file has been loaded or
>>>> mapped and a size for the allocated memory portion. The VM then assumes
>>>> that beginning at heapStart there is an image provided that it needs to
>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
>>> A suggestion:
>>>
>>> Interpreter>>installImage: address size: imageSize header: header
>> I don't like that. It implies that the support code needs to be able to
>> dissect the image file to find out how much is header and how much is data.
>> That implies the support code needs to know about image format, byte
>> reversal, and several other things. The interpreter already does that so why
>> should we have to duplicate that in the support code again?
>>
>>> Rationale:
>>>
>>> - "Install" is more appropriate that "read", since the contents of the
>>> image file have already been read.
>>>
>>> - "Heap" is not relevant.
>> Fair enough. I don't really care what it's called.
>>
>>> - The file header is logically distinct from from the contents of the
>>> image proper, and since someone else did the reading, it seems better
>>> to pass them as two distinct things (*). The pointer to header is
>>> presumed to be a memory block of at least 64 bytes, so in the usual
>>> case of a 32 bit image read into a block of malloc'ed space, we expect
>>> that address == header + 64.
>> See above. You're making it artificially harder for the support code by
>> requiring it to dissect the file and figure out what the header is. There is
>> really no point to that.
>>
>>> My opinion: Adding the entry point is easy to do and has a clear benefit
>>> for at least one platform of interest, so we should do this. Removing
>>> the existing entry point is debatable and can be done at a later time.
>> Why do you think removing the existing entry point is debatable? I can't see
>> a long-term benefit for using readImageFromFile: given that the new entry
>> point trivially subsumes the old one.
>>
>> Cheers,
>> - Andreas
>>
>
>
>

Igor Stasenko

Re: [Vm-dev] [Fwd: Image file loading]

2010/1/20 Andreas Raab <[hidden email]>:

>
> Igor Stasenko wrote:
>>
>> Platform code invoking:
>>
>> sqReadImageFrom(void *object).
>>
>> then, an interpreter reading a header and determining a heap size.
>> (a question, though, does image size encoded in header?)
>
> The image size is encoded in the file header. The heap size isn't. That's
> where the second argument came from that I provided.
>
>> once interpreter prepares a memory for image and decoding the header,
>> it makes a second call to platform code to read the object memory, and
>> we're done.
>
> Why even do that? The platform code can allocate and read the thing before
> it passes it on to the interpreter. Read my example again:
>
> main.c:
>
> memory = malloc(heapSize);
> fread(imageFile, memory, fsize(imageFile));
> readImageFromHeapSize(memory, heapSize);
> interpret();
>
> The platform code reads the allocates memory and reads the image. It can do
> this by reading a file or (in the Android case) by doing a series of JNI
> calls that pass uncompressed portions of a chunked image file from the app
> package (yes, it is complicated on Android). Or the support code could read
> it from a socket connection. Whatever. There is no reason for the
> interpreter to be involved *at all* in reading the image file.
>

Well, the current code implies some logic when decoding a header, like
allocating enough space + breathing room for object memory.

So, i thought that its better when interpreter controls the memory
allocation size, not platform code alone.

> Cheers,
> - Andreas
>
>> No file access, no squeakFileOffsetType mess. Nothing.
>> A platform code deciding these details by own!
>>
>> 2010/1/20 Andreas Raab <[hidden email]>:
>>>
>>> David T. Lewis wrote:
>>>>
>>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
>>>>>
>>>>> What I'm proposing is an additional VM entry point, called
>>>>>
>>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize
>>>>>
>>>>> which takes to arguments: The pointer to the beginning of the
>>>>> (pre-allocated) memory portion where the image file has been loaded or
>>>>> mapped and a size for the allocated memory portion. The VM then assumes
>>>>> that beginning at heapStart there is an image provided that it needs to
>>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
>>>>
>>>> A suggestion:
>>>>
>>>> Interpreter>>installImage: address size: imageSize header: header
>>>
>>> I don't like that. It implies that the support code needs to be able to
>>> dissect the image file to find out how much is header and how much is
>>> data.
>>> That implies the support code needs to know about image format, byte
>>> reversal, and several other things. The interpreter already does that so
>>> why
>>> should we have to duplicate that in the support code again?
>>>
>>>> Rationale:
>>>>
>>>> - "Install" is more appropriate that "read", since the contents of the
>>>> image file have already been read.
>>>>
>>>> - "Heap" is not relevant.
>>>
>>> Fair enough. I don't really care what it's called.
>>>
>>>> - The file header is logically distinct from from the contents of the
>>>> image proper, and since someone else did the reading, it seems better
>>>> to pass them as two distinct things (*). The pointer to header is
>>>> presumed to be a memory block of at least 64 bytes, so in the usual
>>>> case of a 32 bit image read into a block of malloc'ed space, we expect
>>>> that address == header + 64.
>>>
>>> See above. You're making it artificially harder for the support code by
>>> requiring it to dissect the file and figure out what the header is. There
>>> is
>>> really no point to that.
>>>
>>>> My opinion: Adding the entry point is easy to do and has a clear benefit
>>>> for at least one platform of interest, so we should do this. Removing
>>>> the existing entry point is debatable and can be done at a later time.
>>>
>>> Why do you think removing the existing entry point is debatable? I can't
>>> see
>>> a long-term benefit for using readImageFromFile: given that the new entry
>>> point trivially subsumes the old one.
>>>
>>> Cheers,
>>> - Andreas
>>>
>>
>>
>>
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

Re: [Vm-dev] [Fwd: Image file loading]

Igor Stasenko wrote:

> 2010/1/20 Andreas Raab <[hidden email]>:
>> The platform code reads the allocates memory and reads the image. It can do
>> this by reading a file or (in the Android case) by doing a series of JNI
>> calls that pass uncompressed portions of a chunked image file from the app
>> package (yes, it is complicated on Android). Or the support code could read
>> it from a socket connection. Whatever. There is no reason for the
>> interpreter to be involved *at all* in reading the image file.
>>
> Well, the current code implies some logic when decoding a header, like
> allocating enough space + breathing room for object memory.

That's a problem the support code already deals with. The support code
already knows the limits on memory that the user specified (via -memory
argument or implicitly via mmap() or similar means) and it also already
needs to check that the image can actually fit into the heap (or else
it'll blow up when trying to read it). So it's similarly trivial for the
support code to say:

if(heapSize < fsize(imageFile) + headRoom)
abort("Not enough memory to load image");

> So, i thought that its better when interpreter controls the memory
> allocation size, not platform code alone.

The platform already completely controls the allocation. It's only that
we allow the interpreter to make the first request and that request is
based on what we've passed in as desiredHeapSize from the support code.
Lots of useless complexity, the support code might as well just allocate
it without involving the interpreter.

Cheers,
- Andreas

>
>> Cheers,
>> - Andreas
>>
>>> No file access, no squeakFileOffsetType mess. Nothing.
>>> A platform code deciding these details by own!
>>>
>>> 2010/1/20 Andreas Raab <[hidden email]>:
>>>> David T. Lewis wrote:
>>>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
>>>>>> What I'm proposing is an additional VM entry point, called
>>>>>>
>>>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize
>>>>>>
>>>>>> which takes to arguments: The pointer to the beginning of the
>>>>>> (pre-allocated) memory portion where the image file has been loaded or
>>>>>> mapped and a size for the allocated memory portion. The VM then assumes
>>>>>> that beginning at heapStart there is an image provided that it needs to
>>>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment etc).
>>>>> A suggestion:
>>>>>
>>>>> Interpreter>>installImage: address size: imageSize header: header
>>>> I don't like that. It implies that the support code needs to be able to
>>>> dissect the image file to find out how much is header and how much is
>>>> data.
>>>> That implies the support code needs to know about image format, byte
>>>> reversal, and several other things. The interpreter already does that so
>>>> why
>>>> should we have to duplicate that in the support code again?
>>>>
>>>>> Rationale:
>>>>>
>>>>> - "Install" is more appropriate that "read", since the contents of the
>>>>> image file have already been read.
>>>>>
>>>>> - "Heap" is not relevant.
>>>> Fair enough. I don't really care what it's called.
>>>>
>>>>> - The file header is logically distinct from from the contents of the
>>>>> image proper, and since someone else did the reading, it seems better
>>>>> to pass them as two distinct things (*). The pointer to header is
>>>>> presumed to be a memory block of at least 64 bytes, so in the usual
>>>>> case of a 32 bit image read into a block of malloc'ed space, we expect
>>>>> that address == header + 64.
>>>> See above. You're making it artificially harder for the support code by
>>>> requiring it to dissect the file and figure out what the header is. There
>>>> is
>>>> really no point to that.
>>>>
>>>>> My opinion: Adding the entry point is easy to do and has a clear benefit
>>>>> for at least one platform of interest, so we should do this. Removing
>>>>> the existing entry point is debatable and can be done at a later time.
>>>> Why do you think removing the existing entry point is debatable? I can't
>>>> see
>>>> a long-term benefit for using readImageFromFile: given that the new entry
>>>> point trivially subsumes the old one.
>>>>
>>>> Cheers,
>>>> - Andreas
>>>>
>>>
>>>
>
>
>

David T. Lewis

Re: [Vm-dev] [Fwd: Image file loading]

In reply to this post by Andreas.Raab

On Tue, Jan 19, 2010 at 05:34:25PM -0800, Andreas Raab wrote:

>
> David T. Lewis wrote:
> >
> >A suggestion:
> >
> > Interpreter>>installImage: address size: imageSize header: header
>
> I don't like that. It implies that the support code needs to be able to
> dissect the image file to find out how much is header and how much is
> data. That implies the support code needs to know about image format,
> byte reversal, and several other things. The interpreter already does
> that so why should we have to duplicate that in the support code again?

OK

> >My opinion: Adding the entry point is easy to do and has a clear benefit
> >for at least one platform of interest, so we should do this. Removing
> >the existing entry point is debatable and can be done at a later time.
>
> Why do you think removing the existing entry point is debatable? I can't
> see a long-term benefit for using readImageFromFile: given that the new
> entry point trivially subsumes the old one.

I don't mean to debate it ;) My point is that it can be handled later
when everyone has agreed and the necessary action has been taken to
implement it. This may or may not happen quickly; meanwhile adding the
new entry point is straightforward.

Dave

johnmci

Re: [Vm-dev] [Fwd: Image file loading]

In reply to this post by Igor Stasenko

On 2010-01-19, at 6:04 PM, Igor Stasenko wrote:

>
> 2010/1/20 Andreas Raab <[hidden email]>:
>>
>> Igor Stasenko wrote:
>>>
>>> Platform code invoking:
>>>
>>> sqReadImageFrom(void *object).
>

Ok, the other thing that goes on here is that I mmap the start of the bytecodes to 500MB on os-x and the iPhone

usqInt sqAllocateMemoryMac(sqInt minHeapSize, sqInt *desiredHeapSize, FILE * f,usqInt headersize)

the headersize that's passed in helps me to adjust where the bytecode start is

This is done so that I can avoid swizzling all pointers between restarts on the machines, on os-x and the iPhone byte codes start at 500MB...

Note this becomes more interesting if you can avoid swizzling all the pointers at startup time on FLASH based devices so
you don't load the entire image into RAM. In my iPhone work for wikiserver the image is 1932 pages, but I only need to load 806 pages
into ram to get the first page up. {Yes of course I had to remove the crappy allInstances in various places at startup time}. If you check
the pointer swizzle doesn't do anything if the old start of memory offset matchs the new start of memory offset.

BTW Loading the extra 1100 pages actually *is* measured in seconds, so the gain is important.

--
===========================================================================
John M. McIntosh <[hidden email]> Twitter: squeaker68882
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
===========================================================================

Andreas.Raab

Re: [Vm-dev] [Fwd: Image file loading]

In reply to this post by David T. Lewis

David T. Lewis wrote:

>>> My opinion: Adding the entry point is easy to do and has a clear benefit
>>> for at least one platform of interest, so we should do this. Removing
>>> the existing entry point is debatable and can be done at a later time.
>> Why do you think removing the existing entry point is debatable? I can't
>> see a long-term benefit for using readImageFromFile: given that the new
>> entry point trivially subsumes the old one.
>
> I don't mean to debate it ;) My point is that it can be handled later
> when everyone has agreed and the necessary action has been taken to
> implement it. This may or may not happen quickly; meanwhile adding the
> new entry point is straightforward.

Ah. Agreed :-)

Cheers,
- Andreas

Igor Stasenko

Re: [Vm-dev] [Fwd: Image file loading]

In reply to this post by Andreas.Raab

2010/1/20 Andreas Raab <[hidden email]>:

>
> Igor Stasenko wrote:
>>
>> 2010/1/20 Andreas Raab <[hidden email]>:
>>>
>>> The platform code reads the allocates memory and reads the image. It can
>>> do
>>> this by reading a file or (in the Android case) by doing a series of JNI
>>> calls that pass uncompressed portions of a chunked image file from the
>>> app
>>> package (yes, it is complicated on Android). Or the support code could
>>> read
>>> it from a socket connection. Whatever. There is no reason for the
>>> interpreter to be involved *at all* in reading the image file.
>>>
>> Well, the current code implies some logic when decoding a header, like
>> allocating enough space + breathing room for object memory.
>
> That's a problem the support code already deals with. The support code
> already knows the limits on memory that the user specified (via -memory
> argument or implicitly via mmap() or similar means) and it also already
> needs to check that the image can actually fit into the heap (or else it'll
> blow up when trying to read it). So it's similarly trivial for the support
> code to say:
>
> if(heapSize < fsize(imageFile) + headRoom)
> abort("Not enough memory to load image");
>
>> So, i thought that its better when interpreter controls the memory
>> allocation size, not platform code alone.
>
> The platform already completely controls the allocation. It's only that we
> allow the interpreter to make the first request and that request is based on
> what we've passed in as desiredHeapSize from the support code. Lots of
> useless complexity, the support code might as well just allocate it without
> involving the interpreter.
>

Support code may not know the image size beforehead.
Or image file could be combined with something else into a single file
, which makes it much larger than actually need for image.

You'll never know how much space you need before loading a header and
determining the object memory size. The desizered heap size could be
actually a maximum heap size available to allocate but not optimal nor
minimal. That's why i think its worth to leave allocation at VM side,
not in support code, as long as you leaving header reading there.

> Cheers,
> - Andreas
>
>>
>>> Cheers,
>>> - Andreas
>>>
>>>> No file access, no squeakFileOffsetType mess. Nothing.
>>>> A platform code deciding these details by own!
>>>>
>>>> 2010/1/20 Andreas Raab <[hidden email]>:
>>>>>
>>>>> David T. Lewis wrote:
>>>>>>
>>>>>> On Tue, Jan 19, 2010 at 02:30:43PM -0800, Andreas Raab wrote:
>>>>>>>
>>>>>>> What I'm proposing is an additional VM entry point, called
>>>>>>>
>>>>>>> Interpreter>>readImageFromHeap: heapStart size: heapSize
>>>>>>>
>>>>>>> which takes to arguments: The pointer to the beginning of the
>>>>>>> (pre-allocated) memory portion where the image file has been loaded
>>>>>>> or
>>>>>>> mapped and a size for the allocated memory portion. The VM then
>>>>>>> assumes
>>>>>>> that beginning at heapStart there is an image provided that it needs
>>>>>>> to
>>>>>>> properly prepare (i.e., perform byte-reversal, pointer adjustment
>>>>>>> etc).
>>>>>>
>>>>>> A suggestion:
>>>>>>
>>>>>> Interpreter>>installImage: address size: imageSize header: header
>>>>>
>>>>> I don't like that. It implies that the support code needs to be able to
>>>>> dissect the image file to find out how much is header and how much is
>>>>> data.
>>>>> That implies the support code needs to know about image format, byte
>>>>> reversal, and several other things. The interpreter already does that
>>>>> so
>>>>> why
>>>>> should we have to duplicate that in the support code again?
>>>>>
>>>>>> Rationale:
>>>>>>
>>>>>> - "Install" is more appropriate that "read", since the contents of the
>>>>>> image file have already been read.
>>>>>>
>>>>>> - "Heap" is not relevant.
>>>>>
>>>>> Fair enough. I don't really care what it's called.
>>>>>
>>>>>> - The file header is logically distinct from from the contents of the
>>>>>> image proper, and since someone else did the reading, it seems better
>>>>>> to pass them as two distinct things (*). The pointer to header is
>>>>>> presumed to be a memory block of at least 64 bytes, so in the usual
>>>>>> case of a 32 bit image read into a block of malloc'ed space, we
>>>>>> expect
>>>>>> that address == header + 64.
>>>>>
>>>>> See above. You're making it artificially harder for the support code by
>>>>> requiring it to dissect the file and figure out what the header is.
>>>>> There
>>>>> is
>>>>> really no point to that.
>>>>>
>>>>>> My opinion: Adding the entry point is easy to do and has a clear
>>>>>> benefit
>>>>>> for at least one platform of interest, so we should do this. Removing
>>>>>> the existing entry point is debatable and can be done at a later time.
>>>>>
>>>>> Why do you think removing the existing entry point is debatable? I
>>>>> can't
>>>>> see
>>>>> a long-term benefit for using readImageFromFile: given that the new
>>>>> entry
>>>>> point trivially subsumes the old one.
>>>>>
>>>>> Cheers,
>>>>> - Andreas
>>>>>
>>>>
>>>>
>>
>>
>>
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

Re: [Vm-dev] [Fwd: Image file loading]

Igor Stasenko wrote:
>> The platform already completely controls the allocation. It's only that we
>> allow the interpreter to make the first request and that request is based on
>> what we've passed in as desiredHeapSize from the support code. Lots of
>> useless complexity, the support code might as well just allocate it without
>> involving the interpreter.
>>
>
> Support code may not know the image size beforehead.

Yes it does. The support code does the reading after all. Since an
entire image file needs to be read before it can be interpreted, there
is no way that the support code wouldn't know its size. Even if you ship
it over the wire (say in a http request) all protocols that I'm aware
about have information that encode the size of its payload
(Content-Length in http). The fact that the dataLength is also part of
the header should be seen as a duplication of information that can be
used by the interpreter to verify the integrity of the image. It doesn't
mean that this will be the only source of information about the size of
the image.

> Or image file could be combined with something else into a single file
> , which makes it much larger than actually need for image.

If you need that, you would implement the means to recognize and
reconcile the situation appropriately. For example, on Android I'm not
loading the entire package into memory. I'm extracting portions of the
zip file and load them into proper locations. I'm doing this because I'm
in a non-standard situation that requires adjusting for its tradeoffs.

However, we know that the "standard situation" is one in which loading
the entire image file is entirely appropriate. Moreoever, we know that
regardless of the method of I/O we're using we always need to produce a
complete image file first. Since we know that, and since the method of
I/O will differ depending on the context in which your VM runs in, it is
not meaningful to worry about situations that you describe - nobody
would load more than they'd have to unless that's considered a
reasonable tradeoff (there may be situations where this is actually
advantageous).

There is really nothing to worry about here. The standard situation is
that we can read an entire image file. The non-standard situations have
to be dealt with on a case-by-case basis but in any case they'll have a
means of identifying the image, loading into memory and providing it to
the interpreter. There is no reason to worry about someone writing the
support code just randomly passing excess bytes to the interpreter.

Cheers,
- Andreas

Igor Stasenko

Re: [Vm-dev] [Fwd: Image file loading]

2010/1/20 Andreas Raab <[hidden email]>:

>
> Igor Stasenko wrote:
>>>
>>> The platform already completely controls the allocation. It's only that
>>> we
>>> allow the interpreter to make the first request and that request is based
>>> on
>>> what we've passed in as desiredHeapSize from the support code. Lots of
>>> useless complexity, the support code might as well just allocate it
>>> without
>>> involving the interpreter.
>>>
>>
>> Support code may not know the image size beforehead.
>
> Yes it does. The support code does the reading after all. Since an entire
> image file needs to be read before it can be interpreted, there is no way
> that the support code wouldn't know its size. Even if you ship it over the
> wire (say in a http request) all protocols that I'm aware about have
> information that encode the size of its payload (Content-Length in http).
> The fact that the dataLength is also part of the header should be seen as a
> duplication of information that can be used by the interpreter to verify the
> integrity of the image. It doesn't mean that this will be the only source of
> information about the size of the image.
>
>> Or image file could be combined with something else into a single file
>> , which makes it much larger than actually need for image.
>
> If you need that, you would implement the means to recognize and reconcile
> the situation appropriately. For example, on Android I'm not loading the
> entire package into memory. I'm extracting portions of the zip file and load
> them into proper locations. I'm doing this because I'm in a non-standard
> situation that requires adjusting for its tradeoffs.
>
> However, we know that the "standard situation" is one in which loading the
> entire image file is entirely appropriate. Moreoever, we know that
> regardless of the method of I/O we're using we always need to produce a
> complete image file first. Since we know that, and since the method of I/O
> will differ depending on the context in which your VM runs in, it is not
> meaningful to worry about situations that you describe - nobody would load
> more than they'd have to unless that's considered a reasonable tradeoff
> (there may be situations where this is actually advantageous).
>
> There is really nothing to worry about here. The standard situation is that
> we can read an entire image file. The non-standard situations have to be
> dealt with on a case-by-case basis but in any case they'll have a means of
> identifying the image, loading into memory and providing it to the
> interpreter. There is no reason to worry about someone writing the support
> code just randomly passing excess bytes to the interpreter.
>

Well, i mainly meant some kind of a network startup scenario, when
image data is feeded directly from socket.
Platform code could pass the socket to interpreter (as void* object),
and then wait for allocation & read requests which will follow.
And so, you don't have to use some intermediate memory buffer to
preload an image and only then pass
it to interpreter entry point.

> Cheers,
> - Andreas
>

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

Re: [Vm-dev] [Fwd: Image file loading]

Igor Stasenko wrote:
> Well, i mainly meant some kind of a network startup scenario, when
> image data is feeded directly from socket.
> Platform code could pass the socket to interpreter (as void* object),
> and then wait for allocation & read requests which will follow.
> And so, you don't have to use some intermediate memory buffer to
> preload an image and only then pass
> it to interpreter entry point.

But Igor, unless you are *trying* to be obscure why would your network
protocol (that the support code needs to implement anyway) not include
any information about how many bytes are to come? See my previous
example about http - you could trivially launch an image by giving it an
HTTP url; download the image into memory and launch the interpreter
without having to involve the interpreter at all.

This is *necessarily* true for *all* I/O operations that load an image.
There simply is no point to pass a void* to the interpreter only have
it passed back as a void* to the support code. If you're doing that, why
not load the thing right away instead of unnecessarily passing pointers
back and forth?

Cheers,
- Andreas

Igor Stasenko

Re: [Vm-dev] [Fwd: Image file loading]

2010/1/20 Andreas Raab <[hidden email]>:

>
> Igor Stasenko wrote:
>>
>> Well, i mainly meant some kind of a network startup scenario, when
>> image data is feeded directly from socket.
>> Platform code could pass the socket to interpreter (as void* object),
>> and then wait for allocation & read requests which will follow.
>> And so, you don't have to use some intermediate memory buffer to
>> preload an image and only then pass
>> it to interpreter entry point.
>
> But Igor, unless you are *trying* to be obscure why would your network
> protocol (that the support code needs to implement anyway) not include any
> information about how many bytes are to come? See my previous example about
> http - you could trivially launch an image by giving it an HTTP url;
> download the image into memory and launch the interpreter without having to
> involve the interpreter at all.
>
> This is *necessarily* true for *all* I/O operations that load an image.
> There simply is no point to pass a void* to the interpreter only have it
> passed back as a void* to the support code. If you're doing that, why not
> load the thing right away instead of unnecessarily passing pointers back and
> forth?
>

because of image header parsing logic, which ,as you said should stay
in interpreter.
Why one should wait , downloading the whole image, only after that to
discover that image format is not compatible
with interpreter, or not enough memory etc etc?

> Cheers,
> - Andreas
>

--
Best regards,
Igor Stasenko AKA sig.