FFI | ByteArrays: Authentic or Fabricated? :-)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

FFI | ByteArrays: Authentic or Fabricated? :-)

marcel.taeumel
Hi, all!

How can I figure out, whether the handle of an external object -- if it is a byte array -- was fabricated in the image or whether it is actually something creating in the external library? For actual instances of ExternalAddress, it is obvious that those are meant to point to external memory. But what about instances of ByteArray being stored in the "handle" instVar?

I tried Object >> #isPinned. :-D Did not work. I am looking at all the primitives in ByteArray.

... is there actually a difference? Or are all byte arrays that I find in the image actually in the object memory?

Best,
Marcel


Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

codefrau
On Wed, May 20, 2020 at 10:00 AM Marcel Taeumel <[hidden email]> wrote:
Or are all byte arrays that I find in the image actually in the object memory?

Yes they are.

- Vanessa - 



Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

Eliot Miranda-2
In reply to this post by marcel.taeumel

Hi Marcel,


On May 20, 2020, at 9:53 AM, Marcel Taeumel <[hidden email]> wrote:




Hi, all!



How can I figure out, whether the handle of an external object -- if it is a byte array -- was fabricated in the image or whether it is actually something creating in the external library? For actual instances of ExternalAddress, it is obvious that those are meant to point to external memory. But what about instances of ByteArray being stored in the "handle" instVar?


I tried Object >> #isPinned. :-D Did not work. I am looking at all the primitives in ByteArray.


... is there actually a difference? Or are all byte arrays that I find in the image actually in the object memory?


All objects you can find in the image are either in object memory or are immediates (SmallInteger, Character or SmallFloat64).


Various objects can contain values that are pointers to external memory (ExternalAddress, Alien, ByteArray).


A pinned object is merely an object that is in old space (and will have been moved there by a become: on pinning if it was in new space) and, because it has the pinned hit set, will not be moved by the old space compactor.  Note that an object being pinned will not prevent it from being garbage collected if it is unreferenced.


Before I rabbit on irrelevantly can you say more about the issues you’re tackling?



Best,

Marcel


Cheers!

Eliot



Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

David T. Lewis
In reply to this post by marcel.taeumel
On Wed, May 20, 2020 at 06:53:37PM +0200, Marcel Taeumel wrote:

> Hi, all!
>
> How can I figure out, whether the handle of an external object -- if it is a byte array -- was fabricated in the image or whether it is actually something creating in the external library? For actual instances of ExternalAddress, it is obvious that those are meant to point to external memory. But what about instances of ByteArray being stored in the "handle" instVar?
>
> I tried Object >> #isPinned. :-D Did not work. I am looking at all the primitives in ByteArray.
>
> ... is there actually a difference? Or are all byte arrays that I find in the image actually in the object memory?
>
> Best,
> Marcel
>

There is no simple answer.

A good illustration is this:

        SourceFiles first fileID

On a 64-bit VM, you will see a ByteArray of size 24. On a 32-bit VM, it
is a shorter ByteArray. In either case, the ByteArray instance exists
entirely within the object memory.

The byte values within that ByteArray happen to be the value of a C pointer,
which is the address in the process virtual memory of a data structure that
lives in FilePlugin within the VM. That data structure contains various things,
including (on a Unix VM) another pointer to a FILE struct that lives in the
C runtime library.

None of those pointers or internal things have any meaning within the image
or within the object memory itself. It is best to think of the fileID field
as an opaque handle to something in the external world, and the fact that
the bytes just happen to be a C pointer is something that you are supposed
to not notice.

I really wish that Andreas could be here to comment. I clearly recall his
shock and dismay on finding out that I was using the actual byte contents of
a fileID to do things in the OSProcess plugin. We had slightly different
perspectives on that topic, but if you were at all interested in issues of
security for the Squeak execution environment (as Andreas was), then you would
want to hear his perspective.

So in some sense, you should not really be able to know if a ByteArray contains
a pointer to something elsewhere in the virtual memory of the VM. On the
other hand, if you already know that you are doing something dangerous and
insecure, then it would be really convenient to be able to answer the question
that you are asking - does this ByteArray object in the object memory contain
a reference to some external thing outside of the object memory, and if so
is it safe for me to use it?

I don't know that there could ever be a safe answer to that question. The
image and the VM have no way of knowing what happens to things at the other
end of that C pointer. So for example in the case of fileID, you really need
to keep track of when a FileStream refers to invalid addresses. Thus the
data structure is:

  /* squeak file record; see sqFilePrims.c for details */
  typedef struct {
    int                    sessionID;     /* ikp: must be first */
    void                  *file;
    squeakFileOffsetType   fileSize;      /* 64-bits we hope. */
  #if defined(ACORN)
  // ACORN has to have 'lastOp' as at least a 32 bit field in order to work
    int lastOp; // actually used to save file position
    char writable;
    char lastChar;
    char isStdioStream;
  #else
    char                   writable;
    char                   lastOp; /* 0 = uncommitted, 1 = read, 2 = write */
    char                   lastChar;
    char                   isStdioStream;
  #endif
  } SQFile;

The first field of the struture is sessionID, which is a value associated with
the currently running VM program. If you save your image and start it again,
the sessionID in the new VM instance will now be different, which allows the
FilePlugin to figure out that the pointer to the FILE struct (or to a HANDLE
on Windows) is not valid, and therefore it should not attempt to dereference
that pointer (VM crash).

This is just one example, but it illustrates that general case, which is that
the VM cannot be expected to keep track of what people are doing on the other
end of those C pointers, and the image in turn cannot be expected to know if
a ByteArray that contains a C pointer is referring to anything useful or
safe on the other end of the pointer that was saved in the ByteArray.

In specific cases, you can consider handling this by keeping track of the
known valid external references. If you look at the Windows FilePlugin, you
will see that Andreas did this by maintaining a registry of known valid
HANDLE values, and failing the primitives when an unregistered HANDLE was
passed, e.g. by my WindowsOSProcessPlugin which attempted to pass unregistered
HANDLE values for anonymous pipes.

This was an annoyance for me because I could not pursue my OSProcess hacks
on Windows (and I abandoned the effort). But from a security and system
integrity point of view, Andreas was right. To this day, I do not have
any good answer for how to handle this.

So it is not a easy problem.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

marcel.taeumel
Hi Dave, hi Eliot, hi Vanessa!

Thank you very much. Those answers are very helpful!

I am trying to learn about the differences of talking to a C library between from a C program and from within Squeak through FFI. I am especially interested in the existing safety nets to rely on or common patterns to follow when using FFI. 

This includes:
- how to care for external structures created through FFI calls
- how to care for external structures created in-image, then passed to FFI calls
- when to use #new or #externalNew (+#free)
- differences between a handle being a ByteArray or an ExternalAddress
- what happens to all by external structures when (re-)starting the image
- ...

In this learning process, I want to double-check whether more clues can be offered through Squeak's tools. Especially if an action would crash the VM.

Latest thing -- that's why this question about ByteArrays -- was how to re-think this code:

MyStruct foo;
someFunctionFillsMyStruct(&foo);

Into this code:

foo := MyStruct new. "handle is ByteArray"
self apiSomeFunctionFillsMyStruct: foo.

Meaning, what whould be on the stack in C, can conveniently be hold in Squeak's object memory to be shared across Squeak processes and applications. No need to use malloc() and free():

MyStruct *foo = malloc(sizeof(MyStruct));
someFunctionFillsMyStruct(foo);
...
free(foo);

Which I can translate to Squeak FFI:

foo := MyStruct externalNew. "handle is ExternalAddress"
self apiSomeFunctionFillsMyStruct: foo. "same method as above! :-)"
...
foo free.

While there is no need to change #apiSomeFunctionFillsMyStruct: for this, Squeak FFI conveniently copies structs from C stack memory to object memory anyway.  No need to address the heap from within Squeak. Or this there? Performance?

:-)

Best,
Marcel

Am 21.05.2020 04:08:18 schrieb David T. Lewis <[hidden email]>:

On Wed, May 20, 2020 at 06:53:37PM +0200, Marcel Taeumel wrote:

> Hi, all!
>
> How can I figure out, whether the handle of an external object -- if it is a byte array -- was fabricated in the image or whether it is actually something creating in the external library? For actual instances of ExternalAddress, it is obvious that those are meant to point to external memory. But what about instances of ByteArray being stored in the "handle" instVar?
>
> I tried Object >> #isPinned. :-D Did not work. I am looking at all the primitives in ByteArray.
>
> ... is there actually a difference? Or are all byte arrays that I find in the image actually in the object memory?
>
> Best,
> Marcel
>

There is no simple answer.

A good illustration is this:

SourceFiles first fileID

On a 64-bit VM, you will see a ByteArray of size 24. On a 32-bit VM, it
is a shorter ByteArray. In either case, the ByteArray instance exists
entirely within the object memory.

The byte values within that ByteArray happen to be the value of a C pointer,
which is the address in the process virtual memory of a data structure that
lives in FilePlugin within the VM. That data structure contains various things,
including (on a Unix VM) another pointer to a FILE struct that lives in the
C runtime library.

None of those pointers or internal things have any meaning within the image
or within the object memory itself. It is best to think of the fileID field
as an opaque handle to something in the external world, and the fact that
the bytes just happen to be a C pointer is something that you are supposed
to not notice.

I really wish that Andreas could be here to comment. I clearly recall his
shock and dismay on finding out that I was using the actual byte contents of
a fileID to do things in the OSProcess plugin. We had slightly different
perspectives on that topic, but if you were at all interested in issues of
security for the Squeak execution environment (as Andreas was), then you would
want to hear his perspective.

So in some sense, you should not really be able to know if a ByteArray contains
a pointer to something elsewhere in the virtual memory of the VM. On the
other hand, if you already know that you are doing something dangerous and
insecure, then it would be really convenient to be able to answer the question
that you are asking - does this ByteArray object in the object memory contain
a reference to some external thing outside of the object memory, and if so
is it safe for me to use it?

I don't know that there could ever be a safe answer to that question. The
image and the VM have no way of knowing what happens to things at the other
end of that C pointer. So for example in the case of fileID, you really need
to keep track of when a FileStream refers to invalid addresses. Thus the
data structure is:

/* squeak file record; see sqFilePrims.c for details */
typedef struct {
int sessionID; /* ikp: must be first */
void *file;
squeakFileOffsetType fileSize; /* 64-bits we hope. */
#if defined(ACORN)
// ACORN has to have 'lastOp' as at least a 32 bit field in order to work
int lastOp; // actually used to save file position
char writable;
char lastChar;
char isStdioStream;
#else
char writable;
char lastOp; /* 0 = uncommitted, 1 = read, 2 = write */
char lastChar;
char isStdioStream;
#endif
} SQFile;

The first field of the struture is sessionID, which is a value associated with
the currently running VM program. If you save your image and start it again,
the sessionID in the new VM instance will now be different, which allows the
FilePlugin to figure out that the pointer to the FILE struct (or to a HANDLE
on Windows) is not valid, and therefore it should not attempt to dereference
that pointer (VM crash).

This is just one example, but it illustrates that general case, which is that
the VM cannot be expected to keep track of what people are doing on the other
end of those C pointers, and the image in turn cannot be expected to know if
a ByteArray that contains a C pointer is referring to anything useful or
safe on the other end of the pointer that was saved in the ByteArray.

In specific cases, you can consider handling this by keeping track of the
known valid external references. If you look at the Windows FilePlugin, you
will see that Andreas did this by maintaining a registry of known valid
HANDLE values, and failing the primitives when an unregistered HANDLE was
passed, e.g. by my WindowsOSProcessPlugin which attempted to pass unregistered
HANDLE values for anonymous pipes.

This was an annoyance for me because I could not pursue my OSProcess hacks
on Windows (and I abandoned the effort). But from a security and system
integrity point of view, Andreas was right. To this day, I do not have
any good answer for how to handle this.

So it is not a easy problem.

Dave




Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

codefrau

On Thu 21. May 2020 at 00:30, Marcel Taeumel <[hidden email]> wrote:


Latest thing -- that's why this question about ByteArrays -- was how to re-think this code:

MyStruct foo;
someFunctionFillsMyStruct(&foo);

Into this code:

foo := MyStruct new. "handle is ByteArray"
self apiSomeFunctionFillsMyStruct: foo.

Meaning, what whould be on the stack in C, can conveniently be hold in Squeak's object memory to be shared across Squeak processes and applications.

I was under the impression that is exactly how it works.

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

- Vanessa -


Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

marcel.taeumel
Hi Vanessa!

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

Ah, I thought so. But I did not verify it by looking at the FFI sources. :-)

So, is there any need for #newExternal and #free?

Best,
Marcel

Am 21.05.2020 20:03:40 schrieb Vanessa Freudenberg <[hidden email]>:


On Thu 21. May 2020 at 00:30, Marcel Taeumel <[hidden email]> wrote:


Latest thing -- that's why this question about ByteArrays -- was how to re-think this code:

MyStruct foo;
someFunctionFillsMyStruct(&foo);

Into this code:

foo := MyStruct new. "handle is ByteArray"
self apiSomeFunctionFillsMyStruct: foo.

Meaning, what whould be on the stack in C, can conveniently be hold in Squeak's object memory to be shared across Squeak processes and applications.

I was under the impression that is exactly how it works.

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

- Vanessa -


Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

codefrau
The old object memory did not have pinning. So if you needed an unchanging address, you had to allocate it externally. 

With Spur’s pinned objects there is less need for external allocations, true. 

However, there is more risk of corrupting your object memory if the ByteArray is not large enough. Externally allocated memory is a little safer in that regard. 

Also, you may have to think more about what happens after image reload. Then again, that’s tricky with FFI either way. 

- Vanessa -

On Thu, May 21, 2020 at 11:30 Marcel Taeumel <[hidden email]> wrote:
Hi Vanessa!

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

Ah, I thought so. But I did not verify it by looking at the FFI sources. :-)

So, is there any need for #newExternal and #free?

Best,
Marcel

Am 21.05.2020 20:03:40 schrieb Vanessa Freudenberg <[hidden email]>:


On Thu 21. May 2020 at 00:30, Marcel Taeumel <[hidden email]> wrote:


Latest thing -- that's why this question about ByteArrays -- was how to re-think this code:

MyStruct foo;
someFunctionFillsMyStruct(&foo);

Into this code:

foo := MyStruct new. "handle is ByteArray"
self apiSomeFunctionFillsMyStruct: foo.

Meaning, what whould be on the stack in C, can conveniently be hold in Squeak's object memory to be shared across Squeak processes and applications.

I was under the impression that is exactly how it works.

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

- Vanessa -



Reply | Threaded
Open this post in threaded view
|

Re: FFI | ByteArrays: Authentic or Fabricated? :-)

Nicolas Cellier
There is a small difference though.
Spur alloc on 8 bytes boundary, while the OS might alloc on 16.
Believe it or not, depending on alignment, accelerated path can differ.

Le jeu. 21 mai 2020 à 20:38, Vanessa Freudenberg <[hidden email]> a écrit :
The old object memory did not have pinning. So if you needed an unchanging address, you had to allocate it externally. 

With Spur’s pinned objects there is less need for external allocations, true. 

However, there is more risk of corrupting your object memory if the ByteArray is not large enough. Externally allocated memory is a little safer in that regard. 

Also, you may have to think more about what happens after image reload. Then again, that’s tricky with FFI either way. 

- Vanessa -

On Thu, May 21, 2020 at 11:30 Marcel Taeumel <[hidden email]> wrote:
Hi Vanessa!

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

Ah, I thought so. But I did not verify it by looking at the FFI sources. :-)

So, is there any need for #newExternal and #free?

Best,
Marcel

Am 21.05.2020 20:03:40 schrieb Vanessa Freudenberg <[hidden email]>:


On Thu 21. May 2020 at 00:30, Marcel Taeumel <[hidden email]> wrote:


Latest thing -- that's why this question about ByteArrays -- was how to re-think this code:

MyStruct foo;
someFunctionFillsMyStruct(&foo);

Into this code:

foo := MyStruct new. "handle is ByteArray"
self apiSomeFunctionFillsMyStruct: foo.

Meaning, what whould be on the stack in C, can conveniently be hold in Squeak's object memory to be shared across Squeak processes and applications.

I was under the impression that is exactly how it works.

You just need to make a ByteArray that is large enough to hold the struct. Passing that to FFI will pass a pointer to the first byte of the ByteArray. The API call would fill the ByteArray. So it should Just Work.

- Vanessa -