Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

Eliot Miranda-2
 


On Sun, Oct 14, 2018 at 1:03 PM Alistair Grant <[hidden email]> wrote:
 
Hi All,

Just in case there aren't automatic notifications of submissions to
VMMakerInbox...

I've submitted the following changes (more information after the descriptions):

Name: VMMaker.oscog-AlistairGrant.2455
Author: AlistairGrant
Time: 14 October 2018, 8:59:01.383815 pm
UUID: 9e8e4134-b30b-4734-9477-95d556650155
Ancestors: VMMaker.oscog-eem.2454

VMClass strlen, strncpy and getenv

Pharo stores UTF8 encoded strings in ByteArrays (ByteString, strictly
speaking, expects to only store characters that can be represented as
a single byte in UTF8, e.g. ascii).  ByteArrays are also used within
the simulator to represent buffers allocated by the simulator.  As
such, the strings may either be the length of the ByteArray or less
than the ByteArray size and null terminated.

These changes extend strlen: and strncpy:_:_: to handle ByteArrays and
add some tests (tests for strings in the object memory are todo).

InterpreterPrimitives>>primitiveGetenv: returned nil rather than 0 in
the simulator when a variable that isn't defined is requested.


Name: VMMaker.oscog-AlistairGrant.2456
Author: AlistairGrant
Time: 14 October 2018, 9:25:11.249348 pm
UUID: 47b0319d-df10-4ece-84fc-324d0d35fe1d
Ancestors: VMMaker.oscog-AlistairGrant.2455

FakeStdinStream and FilePluginSimulator do double duty with the #atEnd
flag to allow #sqFile:Read:Into:At: to break out of its loop.  This is
brittle as a additional calls to #atEnd breaks the simulation - which
is what Pharo does.

Instead of doing double duty with #atEnd, do the same as the actual
plugin (sqFileReadIntoAt() in sqFilePluginBasicPrims.c) and ignore the
number of bytes to read when input is from stdin (FakeStdinStream) and
only ever read a single byte (fixes the problem and is closer to the
real plugin behaviour).


Background:

I've successfully got a Pharo 7 image (with FileAttributesPlugin)
running StdioListener in the VM simulator, but have made a few changes
at each level:

- Created a modified Pharo image that avoids FFI:
-- It doesn't load FreeType fonts or Iceberg to avoid FFI callouts.
-- Uses UnixOSProcessPlugin>>primitiveGetCurrentWorkingDirectory
instead of an FFI callout.
- StdioListener has changes to work with Zinc streams
- VMMaker has changes (in addition to the patches above):
-- Added UnixOSProcessPluginSimulator so that
primitiveGetCurrentWorkingDirectory can be used instead of FFI.

Can someone tell me whether FFI is expected to work within the simulator?

As yet no.  But in theory it may be possible.  If you look at the way in which primitives are dispatched in the simulator (dispatchFunctionPointer:) that could be used to implement FFI calls (although not particularly safely).  I have written ThreadedFFIPlugin>>dispatchFunctionPointer:[with:with:with:with:[with:with:]] in such a way that the call could be simulated (when I added support for PrimErrFFIException).  Simulating ThreadedFFIPlugin>>dispatchFunctionPointer:[with:with:with:with:[with:with:]] would require invoking an FFI call itself (i.e. via ExternalFunction>>invokeWithArguments:) to do the actual call-out.  But too do that we would need a derived pointer type, so that e.g. an address in the heap (SpurMemoryManager/ObjectMemory memory inst var) could be passed as an argument; such an address is an offset to the memory ByteArray.  So we're close, but still need some infrastructure (and derived pointers are generally useful anyway).

Clearly such an implementation is unsafe; but increasingly we will need to simulate FFI calls if the simulator is to continue being useful.

Thanks,
Alistair


--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

Ben Coman
 
On Mon, 15 Oct 2018 at 07:21, Eliot Miranda <[hidden email]> wrote:
 
But too do that we would need a derived pointer type

My usual searches... "define derived pointer" and "what is derived pointer" are not being helpful.
All results seem to be pointers to C++ derived classes, which I guess is not directly what your referring to.
Can you educate me on this...?

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

Eliot Miranda-2
 
Hi Ben,

On Oct 15, 2018, at 5:28 AM, Ben Coman <[hidden email]> wrote:

On Mon, 15 Oct 2018 at 07:21, Eliot Miranda <[hidden email]> wrote:
 
But too do that we would need a derived pointer type

My usual searches... "define derived pointer" and "what is derived pointer" are not being helpful.
All results seem to be pointers to C++ derived classes, which I guess is not directly what your referring to.
Can you educate me on this...?

By “derived” I mean a pointer to some point inside an object, not a pointer to the start of an object.  When one passes eg a ByteArray through the FFI to a reference parameter  the marshaling code ends up passing a pointer to the start of the object.  That won’t work if what we want to do is simulate passing a pointer to the start of a simulation object that actually lives at an offset inside the large ByteArray that constitutes the entire heap in the simulation.  So we need to be able to express a ByteArray, offset pair and pass that through the FFI to a reference parameter and have the marshaling code end up passing the derived a pointer that is the start of the ByteArray plus the offset, and hence end up passing a pointer to the start of the simulation object in the large ByteArray that constitutes the entire heap.

cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

alistairgrant
 
Hi Eliot,


On Mon, 15 Oct 2018 at 16:46, Eliot Miranda <[hidden email]> wrote:

>
>
> Hi Ben,
>
> On Oct 15, 2018, at 5:28 AM, Ben Coman <[hidden email]> wrote:
>
> On Mon, 15 Oct 2018 at 07:21, Eliot Miranda <[hidden email]> wrote:
>>
>>
>> But too do that we would need a derived pointer type
>
>
> My usual searches... "define derived pointer" and "what is derived pointer" are not being helpful.
> All results seem to be pointers to C++ derived classes, which I guess is not directly what your referring to.
> Can you educate me on this...?
>
>
> By “derived” I mean a pointer to some point inside an object, not a pointer to the start of an object.  When one passes eg a ByteArray through the FFI to a reference parameter  the marshaling code ends up passing a pointer to the start of the object.  That won’t work if what we want to do is simulate passing a pointer to the start of a simulation object that actually lives at an offset inside the large ByteArray that constitutes the entire heap in the simulation.  So we need to be able to express a ByteArray, offset pair and pass that through the FFI to a reference parameter and have the marshaling code end up passing the derived a pointer that is the start of the ByteArray plus the offset, and hence end up passing a pointer to the start of the simulation object in the large ByteArray that constitutes the entire heap.

As a first step to get things going, couldn't we just copy the
ByteArray's to and from the simulation memory?  It would obviously be
much less efficient, but would reduce the number of pre-requisites to
get started.

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

Eliot Miranda-2
 
Hi Alistair,
On Mon, Oct 15, 2018 at 9:48 AM Alistair Grant <[hidden email]> wrote:
 
Hi Eliot,


On Mon, 15 Oct 2018 at 16:46, Eliot Miranda <[hidden email]> wrote:
>
>
> Hi Ben,
>
> On Oct 15, 2018, at 5:28 AM, Ben Coman <[hidden email]> wrote:
>
> On Mon, 15 Oct 2018 at 07:21, Eliot Miranda <[hidden email]> wrote:
>>
>>
>> But too do that we would need a derived pointer type
>
>
> My usual searches... "define derived pointer" and "what is derived pointer" are not being helpful.
> All results seem to be pointers to C++ derived classes, which I guess is not directly what your referring to.
> Can you educate me on this...?
>
>
> By “derived” I mean a pointer to some point inside an object, not a pointer to the start of an object.  When one passes eg a ByteArray through the FFI to a reference parameter  the marshaling code ends up passing a pointer to the start of the object.  That won’t work if what we want to do is simulate passing a pointer to the start of a simulation object that actually lives at an offset inside the large ByteArray that constitutes the entire heap in the simulation.  So we need to be able to express a ByteArray, offset pair and pass that through the FFI to a reference parameter and have the marshaling code end up passing the derived a pointer that is the start of the ByteArray plus the offset, and hence end up passing a pointer to the start of the simulation object in the large ByteArray that constitutes the entire heap.

As a first step to get things going, couldn't we just copy the
ByteArray's to and from the simulation memory?  It would obviously be
much less efficient, but would reduce the number of pre-requisites to
get started.

Maybe, but I expect it is a very small change to marshaling to spurt this and has high value to the FFI in general (think of trying to pass a field embedded in a struct; that's not yet possible; with derived pointers it is; the facility is generally useful).  IME if one can find a simple more general solution to a problem than some other, then pursue that, even if its initial cost maybe higher because it will pay off better in the long run.
 
Cheers,
Alistair

_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

alistairgrant
 
Hi Eliot,


On Mon, 15 Oct 2018 at 18:55, Eliot Miranda <[hidden email]> wrote:

>
> On Mon, Oct 15, 2018 at 9:48 AM Alistair Grant <[hidden email]> wrote:
>>
>> On Mon, 15 Oct 2018 at 16:46, Eliot Miranda <[hidden email]> wrote:
>> >
>> > By “derived” I mean a pointer to some point inside an object, not a pointer to the start of an object.  When one passes eg a ByteArray through the FFI to a reference parameter  the marshaling code ends up passing a pointer to the start of the object.  That won’t work if what we want to do is simulate passing a pointer to the start of a simulation object that actually lives at an offset inside the large ByteArray that constitutes the entire heap in the simulation.  So we need to be able to express a ByteArray, offset pair and pass that through the FFI to a reference parameter and have the marshaling code end up passing the derived a pointer that is the start of the ByteArray plus the offset, and hence end up passing a pointer to the start of the simulation object in the large ByteArray that constitutes the entire heap.
>>
>> As a first step to get things going, couldn't we just copy the
>> ByteArray's to and from the simulation memory?  It would obviously be
>> much less efficient, but would reduce the number of pre-requisites to
>> get started.
>
> Maybe, but I expect it is a very small change to marshaling to spurt this and has high value to the FFI in general

Cool.  My assumption from the way you originally wrote this was that
it would be relatively expensive to develop.


> (think of trying to pass a field embedded in a struct; that's not yet possible; with derived pointers it is; the facility is generally useful).  IME if one can find a simple more general solution to a problem than some other, then pursue that, even if its initial cost maybe higher because it will pay off better in the long run.

I didn't mean to imply that it wouldn't be done, just that it may not
be the first thing done (based on my incorrect assumption that it
would be expensive to develop).

Cheers,
Alistair
Reply | Threaded
Open this post in threaded view
|

Re: Simulating FFI Calls [was Re: VMMaker simulation - strlen, strcpy, getenv and FakeStdinStream]

Eliot Miranda-2
 
Hi Alistair,

On Mon, Oct 15, 2018 at 10:37 AM Alistair Grant <[hidden email]> wrote:
Hi Eliot,

On Mon, 15 Oct 2018 at 18:55, Eliot Miranda <[hidden email]> wrote:
>
> On Mon, Oct 15, 2018 at 9:48 AM Alistair Grant <[hidden email]> wrote:
>>
>> On Mon, 15 Oct 2018 at 16:46, Eliot Miranda <[hidden email]> wrote:
>> >
>> > By “derived” I mean a pointer to some point inside an object, not a pointer to the start of an object.  When one passes eg a ByteArray through the FFI to a reference parameter  the marshaling code ends up passing a pointer to the start of the object.  That won’t work if what we want to do is simulate passing a pointer to the start of a simulation object that actually lives at an offset inside the large ByteArray that constitutes the entire heap in the simulation.  So we need to be able to express a ByteArray, offset pair and pass that through the FFI to a reference parameter and have the marshaling code end up passing the derived a pointer that is the start of the ByteArray plus the offset, and hence end up passing a pointer to the start of the simulation object in the large ByteArray that constitutes the entire heap.
>>
>> As a first step to get things going, couldn't we just copy the
>> ByteArray's to and from the simulation memory?  It would obviously be
>> much less efficient, but would reduce the number of pre-requisites to
>> get started.
>
> Maybe, but I expect it is a very small change to marshaling to spurt this and has high value to the FFI in general

Cool.  My assumption from the way you originally wrote this was that
it would be relatively expensive to develop.

Taking a superficial look at the code I see this in ffiAtomicArgByReference:Class:in:

(atomicType = FFITypeVoid or:[(atomicType >> 1) = (FFITypeSignedByte >> 1)]) ifTrue:
"byte* -- see comment on string above"
[(isString or: [oopClass = interpreterProxy classByteArray]) ifTrue: "String/Symbol/ByteArray"
[^self ffiPushPointer: (interpreterProxy firstIndexableField: oop) in: calloutState].
(oopClass = interpreterProxy classExternalAddress) ifTrue: 
[^self ffiPushPointer: (self longAt: oop + interpreterProxy baseHeaderSize) in: calloutState].
isAlien ifTrue:
[^self ffiPushPointer: (self pointerForOop: (self startOfData: oop)) in: calloutState].
atomicType = FFITypeVoid ifFalse:
[^FFIErrorCoercionFailed]].
"note: type void falls through"

which could be extended, crudely, with

(oopClass = interpreterProxy classArray
and: [(interpreterProxy slotSizeOf: oop) = 2
and: [(interpreterProxy isWordsOrBytes: (obj := interpreterProxy fetchPointer: 0 ofObject: oop))
and: [(interpreterProxy isInteger: (offset := interpreterProxy fetchPointer: 1 ofObject: oop))
and: [(offset := interpreterProxy integerValueOf: offset) between: 0 and: (interpreterProxy byteSizeOf: obj)]]]]) ifTrue: 
[^self ffiPushPointer: (interpreterProxy firstIndexableFieldOf: obj) + offset in: calloutState].

and something analogous could be done in ffiPushPointerContentsOf:in:, which is sent via ffiAtomicStructByReference:Class:in:.  But simply passing a derived pointer as a tuple of object, offset might be a poor design.  We might want to add a specific class to the FFI, DerivedPointerArgument or some such.  So I think the implementation is extremely cheap; it is the design that vexes (HHGTTG: You haven't even invented the wheel yet!  Well, no. Maybe you can tell us.   What color should it be?)
 
> (think of trying to pass a field embedded in a struct; that's not yet possible; with derived pointers it is; the facility is generally useful).  IME if one can find a simple more general solution to a problem than some other, then pursue that, even if its initial cost maybe higher because it will pay off better in the long run.

I didn't mean to imply that it wouldn't be done, just that it may not
be the first thing done (based on my incorrect assumption that it
would be expensive to develop).

Cheers,
Alistair

_,,,^..^,,,_
best, Eliot