Direct pointer vs pointerForOop()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Direct pointer vs pointerForOop()

Nicolas Cellier
 
some implementations use int to pass a pointer, that clearly does not work in 64bits...

What we need is to pass a buffer which is a Smalltalk ByteArray (or WordArray or something), in other word an oop, (stored into a sqInt/usqInt).

In Cog/Spur, an oop is a direct machine pointer, so we can avoid a translation and that's why some implementation directly declare a void* parameter.

In legacy interpreter VM, an oop is always an sqInt, but not always a pointer. For example a 32bits image can run on a VM compiled for 64bits, and in this case, sqInt is 32-bit wide, while the pointer is 64bits. So we used to pass thru pointerForOop. Some implementations do that.

So what we have now is a big mix of partly compatible/uncompatible code. We have to fix the int buf, but what should we better use, pointerForOop(sqInt buf) or direct void *buf?

Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

David T. Lewis
 
On Sat, Apr 07, 2018 at 02:25:18PM +0200, Nicolas Cellier wrote:

>  
> Hi all,
> I was about fixing
> https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/240
> some implementations use int to pass a pointer, that clearly does not work
> in 64bits...
>
> What we need is to pass a buffer which is a Smalltalk ByteArray (or
> WordArray or something), in other word an oop, (stored into a sqInt/usqInt).
>
> In Cog/Spur, an oop is a direct machine pointer, so we can avoid a
> translation and that's why some implementation directly declare a void*
> parameter.
>
> In legacy interpreter VM, an oop is always an sqInt, but not always a
> pointer. For example a 32bits image can run on a VM compiled for 64bits,
> and in this case, sqInt is 32-bit wide, while the pointer is 64bits. So we
> used to pass thru pointerForOop. Some implementations do that.
>
> So what we have now is a big mix of partly compatible/uncompatible code. We
> have to fix the int buf, but what should we better use, pointerForOop(sqInt
> buf) or direct void *buf?
>
> Nicolas

It is good to maintain a clear distinction between object pointers
and "C" pointers.

There were (and still are) a number of plugins that never got updated
to be 32/64 bit clean. These still have problems such as pointers stored
in int fields. For these cases, it is best to explicitly declare data types
for things like pointers and structs. Once that is done properly, the
rest of the problems go away.

I have long thought that it would be a good idea for purposes of clarity
to have a separate "sqObjectPointer" data type distinct from sqInt and
usqInt. This would make the intentions clear, and I think it would make
the VM code easier to read and understand. But it would probably be a
lot of work to do this throughout the VM.

One thing that I did do was write MemoryAccess as a replacement for the
C macros. One reason for this was to document the functions, because I
always have a hard time remembering what the various macros actually do.
For example, in the case of the pointerForOop() macro, it looks like this:

pointerForOop: oop
        "Answer the machine address of the object to which oop refers. This method maps
        object memory locations to their underlying machine addresses (C pointers).

        char *pointerForOop(usqInt oop) { return sqMemoryBase + oop; }"

        <inline: true>
        <returnTypeC: 'char *'>
        <var: #oop type: 'usqInt'>
        ^ self sqMemoryBaseAddress + oop


Dave

Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

K K Subbu
 
On Saturday 07 April 2018 08:02 PM, David T. Lewis wrote:
> I have long thought that it would be a good idea for purposes of clarity
> to have a separate "sqObjectPointer" data type distinct from sqInt and
> usqInt. This would make the intentions clear, and I think it would make
> the VM code easier to read and understand. But it would probably be a
> lot of work to do this throughout the VM.

While I realize OOP is strongly entrenched in literature, terms like
pointer, address have strong connotations of machine architecture. If
you are going to introduce a new type, I would suggest neutral terms
like sqObjectId, sqObjectIndex or sqObjectHandle.

How about using a union to pass them across Object and Memory layers and
avoid all those horrid casts?

typedef union {
        u32           asId;
        void         *asAddress;
        unsigned int  asWord;
        unsigned long asLongWord;
        unsigned char asBytes[sizeof(void *)];
        ...
} sqObjectId;

Object layer can access it as oop.asId and the Memory layer can use
asWord or asAddress after suitable conversion:

sqObjectId pointerForOop(sqObjectId oop)
{return (oop.asAddress = oop.asId + sqMemoryBase, oop);}

sqObjectId oopFromPointer(sqObjectId oop)
{return (oop.asId = sqMemoryBase - oop.asAddr, oop); }

Regards .. Subbu
Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

David T. Lewis
 
On Sat, Apr 07, 2018 at 11:28:38PM +0530, K K Subbu wrote:

>
> On Saturday 07 April 2018 08:02 PM, David T. Lewis wrote:
> >I have long thought that it would be a good idea for purposes of clarity
> >to have a separate "sqObjectPointer" data type distinct from sqInt and
> >usqInt. This would make the intentions clear, and I think it would make
> >the VM code easier to read and understand. But it would probably be a
> >lot of work to do this throughout the VM.
>
> While I realize OOP is strongly entrenched in literature, terms like
> pointer, address have strong connotations of machine architecture. If
> you are going to introduce a new type, I would suggest neutral terms
> like sqObjectId, sqObjectIndex or sqObjectHandle.
>
> How about using a union to pass them across Object and Memory layers and
> avoid all those horrid casts?
>
> typedef union {
> u32           asId;
> void         *asAddress;
> unsigned int  asWord;
> unsigned long asLongWord;
> unsigned char asBytes[sizeof(void *)];
> ...
> } sqObjectId;
>
> Object layer can access it as oop.asId and the Memory layer can use
> asWord or asAddress after suitable conversion:
>
> sqObjectId pointerForOop(sqObjectId oop)
> {return (oop.asAddress = oop.asId + sqMemoryBase, oop);}
>
> sqObjectId oopFromPointer(sqObjectId oop)
> {return (oop.asId = sqMemoryBase - oop.asAddr, oop); }
>

Hi Subbu,

I was unclear in my meaning when I said "data type" for object pointers.
I really only meant something like "#typedef usqInt sqObjectPointer" such
that "sqObjectPointer" could be used when the intent is an object pointer,
and "usqInt" could be used when the intent is an unsigned integer. Nothing
would actually change, but it might make the VM code a bit easier to read
and understand.

With respect to defining data types, I think that Nicolas has already
handled the most important case with the introduction of sqIntptr and
usqIntptr. This clarifies the usage a lot, and cleans up a lot of problems
across different compilers and platforms.

You are right that "object pointer" is a confusing term for people who may
be thinking in terms of C pointers. We also have a fair amount of confusion
in the use of term "word". But in a very real sense, an object pointer
really is a pointer (not just an identifier), even if that does mean
something different from the term "pointer" in C. I don't know, maybe
"object reference" instead of "object pointer"?

Dave

Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

K K Subbu
 
On Sunday 08 April 2018 02:34 AM, David T. Lewis wrote:

> I was unclear in my meaning when I said "data type" for object pointers.
> I really only meant something like "#typedef usqInt sqObjectPointer" such
> that "sqObjectPointer" could be used when the intent is an object pointer,
> and "usqInt" could be used when the intent is an unsigned integer. Nothing
> would actually change, but it might make the VM code a bit easier to read
> and understand.

Your mail was very clear. But the new type is insufficient to handle all
platform variations or suppress compiler warnings. An object pointer
implementation could use direct address (void *), address offset
(size_t) or table index (unsigned int). Compilers warn if we mix these
in the same expression.

> With respect to defining data types, I think that Nicolas has already
> handled the most important case with the introduction of sqIntptr and
> usqIntptr. This clarifies the usage a lot, and cleans up a lot of problems
> across different compilers and platforms.

This is a really good fix, but it won't stop compilers warning about
mixing ints/pointers :-(. sqMemoryAccess.h has many comments like:
-----
sqInt is a signed integer with size adequate for holding an Object
Oriented Pointer (or immediate value) ....
---

Such intents can be captured in directly in a single union.
Arch-independent code could use oop or oop.asOop while different
implementations could choose different representations (oop.asAddr,
oop.asOffset, oop.asInt, etc) without having to use casts. I have tried
using both casts and unions in my other projects, and the ones with
unions proved to be compact, readable and compiled without type warnings.

> You are right that "object pointer" is a confusing term for people who may
> be thinking in terms of C pointers. We also have a fair amount of confusion
> in the use of term "word". But in a very real sense, an object pointer
> really is a pointer (not just an identifier), even if that does mean
> something different from the term "pointer" in C. I don't know, maybe
> "object reference" instead of "object pointer"?
Excellent suggestion! sqObjectRef fits neatly into object graph model.

Members like word, byte etc. are only used within platform
implementation code where their meaning would be clear. Higher layers
would only use oop.

Regards .. Subbu
Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

Eliot Miranda-2
 
Hi Subbu, Hi David,


> On Apr 8, 2018, at 3:30 AM, K K Subbu <[hidden email]> wrote:
>
>> On Sunday 08 April 2018 02:34 AM, David T. Lewis wrote:
>>
>> I was unclear in my meaning when I said "data type" for object pointers.
>> I really only meant something like "#typedef usqInt sqObjectPointer" such
>> that "sqObjectPointer" could be used when the intent is an object pointer,
>> and "usqInt" could be used when the intent is an unsigned integer. Nothing
>> would actually change, but it might make the VM code a bit easier to read
>> and understand.

I disagree. That isn't useful since sqInt doesn't mean "machine word", it means "integer containing an oop" and so adding sqObjectPointer is simply noise.

>
> Your mail was very clear. But the new type is insufficient to handle all platform variations or suppress compiler warnings. An object pointer implementation could use direct address (void *), address offset (size_t) or table index (unsigned int). Compilers warn if we mix these in the same expression.

Duh. Come on, we know this.  It isn't the issue.  The interface files in platforms/Cross/plugins/XXXPlugin/XXXPlugin.h are free to specify one or the other (but should clearly not mix both).  If they specify integers then the cast to pointer has to happen in the platform implementations, leading to more casts.  If they specify pointers then the cast must happen in the generated plugin code.  I know which I prefer but fixing the warning means fixing the interface file and the plugin code in VMMaker and VMMaker.oscog since we've made zero progress on merging these two packages.

>
>> With respect to defining data types, I think that Nicolas has already
>> handled the most important case with the introduction of sqIntptr and
>> usqIntptr. This clarifies the usage a lot, and cleans up a lot of problems
>> across different compilers and platforms.

Right.  But (pedantry) FYI it is sqIntptr_t.

>
> This is a really good fix, but it won't stop compilers warning about mixing ints/pointers :-(. sqMemoryAccess.h has many comments like:
> -----
> sqInt is a signed integer with size adequate for holding an Object Oriented Pointer (or immediate value) ....
> ---
>
> Such intents can be captured in directly in a single union. Arch-independent code could use oop or oop.asOop while different implementations could choose different representations (oop.asAddr, oop.asOffset, oop.asInt, etc) without having to use casts. I have tried using both casts and unions in my other projects, and the ones with unions proved to be compact, readable and compiled without type warnings.

No.  Using a union requires a huge rewrite of all of the platform code and VMMaker.

A feasible change is to replace sqInt globally with e.g. sqOop where sqOop is an anonymous pointer:

typedef sqOop struct _opaque_object *;

and then rewrite all arithmetic in VMMaker to cast from sqOop to sqInt, where sqInt is defined as "machine word", and discard the 64/32 and 32/64 configurations of VMMaker  as unsupportable.  But I don't the no David is happy about doing that.

In any case using the union is a red herring.  The issue is deciding whether an interface takes an integral argument or a pointer one and sticking to it, casting as appropriate on either side.  The secondary issue is that Slang chose the B route of calling everything an integer, which has costs and benefits, just like the alternative choices.

>
>> You are right that "object pointer" is a confusing term for people who may
>> be thinking in terms of C pointers. We also have a fair amount of confusion
>> in the use of term "word". But in a very real sense, an object pointer
>> really is a pointer (not just an identifier), even if that does mean
>> something different from the term "pointer" in C. I don't know, maybe
>> "object reference" instead of "object pointer"?

An object pointer is only a pointer if we're prepared to drop support for 64/32 32/64 VMMaker configurations.  This is sane, David, but you'll have to sacrifice it.  Spur was architected on this notion. If I were free to make the decision I would

- introduce
    typedef sqOop struct _opaque_object *;
- rewrite sqMemoryAccess.h appropriately
- make sqOop the default argument and parameter type in VMMaker (but not the default return type; that must remain sqInt because primitives don't return results; they put them in the stack explicitly (a different mess))
- make sqInt & usqInt synonymous with sqIntptr_t (i.e. always a machine word), & hence eliminate sqIntptr_t in favour of usqInt

And hang any platform that has different pointer and register sizes.  The DEC 10 and its ilk have died long ago and the MIPS in the PlayStation 2 has too.  Nowadays wide registers are CI fined to special instruction sets (SSE et al) and the general purpose registers are always pointer-sized.

But even this simple rational proposal, which simply says that an oop is a tagged pointer and if you're going to do arithmetic on it you're going to have to cast, is a /lot/ of work in VMMaker.

> Excellent suggestion! sqObjectRef fits neatly into object graph model.
>
> Members like word, byte etc. are only used within platform implementation code where their meaning would be clear. Higher layers would only use oop.
>
> Regards .. Subbu
Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

K K Subbu
 
On Sunday 08 April 2018 04:51 PM, Eliot Miranda wrote:

>> This is a really good fix, but it won't stop compilers warning
>> about mixing ints/pointers:-(. sqMemoryAccess.h has many comments
>> like: ----- sqInt is a signed integer with size adequate for
>> holding an Object Oriented Pointer (or immediate value) .... ---
>>
>> Such intents can be captured in directly in a single union.
>> Arch-independent code could use oop or oop.asOop while different
>> implementations could choose different representations (oop.asAddr,
>> oop.asOffset, oop.asInt, etc) without having to use casts. I have
>> tried using both casts and unions in my other projects, and the
>> ones with unions proved to be compact, readable and compiled
>> without type warnings.
> No.  Using a union requires a huge rewrite of all of the platform
> code and VMMaker.
>
> A feasible change is to replace sqInt globally with e.g. sqOop where
> sqOop is an anonymous pointer:
>
> typedef sqOop struct _opaque_object *;

This won't help eliminate compiler warnings about mixing int/pointers.
We should be taking such warnings seriously as they could hide bugs. The
problem in OP happens when sizeof(int) < sizeof(void *). size_t does not
face the problem. Why not

typedef size_t sqOop;

Now the compiler will catch sqInt/sqOop mixups. We can then use macros
(shortened to fit mail line) to convey intent :

#define i2oop(n) ((sqMemoryBase - (char *)0) + (n))
#define addrAt(oop) ((char *)0 + (oop))
#define oop2i(oop) (addrAt(oop) - sqMemoryBase)

will work on 32image/32vm, {32,64}image/64vm

Regards .. Subbu
Reply | Threaded
Open this post in threaded view
|

Re: Direct pointer vs pointerForOop()

Eliot Miranda-2
 
Hi Subbu,

> On Apr 9, 2018, at 1:50 AM, K K Subbu <[hidden email]> wrote:
>
> On Sunday 08 April 2018 04:51 PM, Eliot Miranda wrote:
>>> This is a really good fix, but it won't stop compilers warning
>>> about mixing ints/pointers:-(. sqMemoryAccess.h has many comments
>>> like: ----- sqInt is a signed integer with size adequate for
>>> holding an Object Oriented Pointer (or immediate value) .... ---
>>> Such intents can be captured in directly in a single union.
>>> Arch-independent code could use oop or oop.asOop while different
>>> implementations could choose different representations (oop.asAddr,
>>> oop.asOffset, oop.asInt, etc) without having to use casts. I have
>>> tried using both casts and unions in my other projects, and the
>>> ones with unions proved to be compact, readable and compiled
>>> without type warnings.
>> No.  Using a union requires a huge rewrite of all of the platform
>> code and VMMaker.
>> A feasible change is to replace sqInt globally with e.g. sqOop where
>> sqOop is an anonymous pointer:
>> typedef sqOop struct _opaque_object *;
>
> This won't help eliminate compiler warnings about mixing int/pointers. We should be taking such warnings seriously as they could hide bugs.

Agreed.  But they are fixed by rewriting the appropriate code in VMMaker.oscog of the plugin packages or the platform C code.

> The problem in OP happens when sizeof(int) < sizeof(void *). size_t does not face the problem. Why not
>
> typedef size_t sqOop;
>
> Now the compiler will catch sqInt/sqOop mixups. We can then use macros (shortened to fit mail line) to convey intent :
>
> #define i2oop(n)    ((sqMemoryBase - (char *)0) + (n))
> #define addrAt(oop)    ((char *)0 + (oop))
> #define oop2i(oop)    (addrAt(oop) - sqMemoryBase)
>
> will work on 32image/32vm, {32,64}image/64vm

Ugh.  I don't want to support sqMemoryBase at all.  It's yet more effort for little return.

>
> Regards .. Subbu