Smalltalk › Squeak › Squeak VM

Object format(s) as a contract with VM?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

6 messages Options

Igor Stasenko

Object format(s) as a contract with VM?

Hello,

i thought about object formats and thought how to make them more
flexible and came to following idea.

Today's squeak and cog vms using a 4 bits in object header which
denoting an object format:

So, there is 16 possible object formats supported by VM:

formatOf: oop
" 0 no fields
1 fixed fields only (all containing pointers)
2 indexable fields only (all containing pointers)
3 both fixed and indexable fields (all containing pointers)
4 both fixed and indexable weak fields (all containing pointers).

5 unused
6 indexable word fields only (no pointers)
7 indexable long (64-bit) fields (only in 64-bit images)

8-11 indexable byte fields only (no pointers) (low 2 bits are
low 2 bits of size)
12-15 compiled methods:
# of literal oops specified in method header,
followed by indexable bytes (same interpretation of
low 2 bits as above)
"

And virtual machine (mostly an ObjectMemory class) knows difference
between those numbers and how to deal with objects depending what
format they having.
The bad side of it, that most semantics around formats is hardcoded
and spreaded across various methods, like:

lastPointerOf: oop
"Return the byte offset of the last pointer field of the given object.
Works with CompiledMethods, as well as ordinary objects.
Can be used even when the type bits are not correct."
| fmt sz methodHeader header contextSize |
<inline: true>
<asmLabel: false>
header := self baseHeader: oop.
fmt := self formatOfHeader: header.
fmt <= 4 ifTrue: [(fmt = 3 and: [self isContextHeader: header])
ifTrue: ["contexts end at the stack pointer"
contextSize := self fetchStackPointerOf: oop.
^ CtxtTempFrameStart + contextSize * BytesPerWord].
sz := self sizeBitsOfSafe: oop.
^ sz - BaseHeaderSize "all pointers"].
fmt < 12 ifTrue: [^ 0]. "no pointers"

"CompiledMethod: contains both pointers and bytes:"
methodHeader := self longAt: oop + BaseHeaderSize.
^ (methodHeader >> 10 bitAnd: 255) * BytesPerWord + BaseHeaderSize

See how smelly this code is?

What i was thinking about is , that what if number which denotes an
object format instead of just being a number, point to a table of
functions, which implement
a behavior, specific for that format.

Then the above method could be turned into something as short and nice
as following:

lastPointerOf: oop
<inline: true>
<asmLabel: false>
| table |
table := (self formatTableOf: oop).
table performSomeOperation: oop param: param

which , when translated to C, will look like:

sqInt formatNumber;
formatNumber = fetchFormatOf(oop);

return ObjectFormats[formatNumber].performSomeOperation(oop, param).

So, the table is simple list of function pointers (which very closely
resembling a method dictionary in smalltalk), to provide an
implementation of certain behavior for given object format:

struct ObjectFormatFunctions {

int (*)(sqInt oop); // firstFixedSlot
int (*)(sqInt oop); // numFixedSlots:
void* (*)(sqInt oop); // lastPointerOf:
.. firstPointerOf:
.. numIndexableSlots:
.. indexableSlotSize:
...
etc
)

so, then instead of hardcoding and spreading the smelly code around
the VM, we can write something like:

(self formatTableOf: oop) performSomeOperation: oop params: params

and code no longer needs to know if format = 0 or 4 or whatever, but
just dispatching to function using formats table.

Then all methods which dealing with object format could be placed in
nicely organized class structure and can be translated to C in form of
tables.

The downside of it, of course, that additional level of indirection
will cause a serious slowdown to interpreter.
Because C compiler don't sees directly the implementation of operation
for certain format and must call a function, pointer to which held in
a corresponding format table.
Moreover, you cannot inline those functions, since you don't know
which one will be used at concrete place because of dynamic dispatch.

How we can deal with it?
Well, we still can hardcode the stuff (granting that we are not
modifying object format table(s) at runtime), so we can write:

fmt = 5 ifTrue: [ ^ self inlineFunctionOfFormat: 5 oop: oop params: params ]

this requires some trickery in C code generator, but it is still doable.
So, in overall i think that drawbacks could be cleverly mitigated if
not completely avoided.

So, lets talk what opportinities it could give us:
VM could support a dynamically changeable object format(s), if we
extend the number of possible formats to more than 16.
So, for first 16 numbers things will be hardcoded as today, but for
higher numbers, VM will always dispatch using format table.

The cost of indirection can be amortized quite well especially in
presence of JIT (it can simply inline the code for corresponding
semantics to access various bits in object
when compiling the methods of class with such format).

Then we could dynamically change (or create new formats) at run time.

But currently we have only 4 bits for object format. Where we can get some more?
A solution is extremely simple! We have a 5 bits in object header for
compact class index.
If we merge them with object format bits 4+5 = 9 -- 512 possible object formats!

The trick is that we still can pretty easily identify special objects
by reserving a concrete format number for them (hey we have plenty of
numbers)
So, then format numbers ranged from 0..15 will identify hardcoded
formats , known by VM from beginning.
And formats from 16..48 is for special objects.
and rest 512-48 is dynamically defined.

Of course, we could change the object header to look differently for
new VM, then there may be more (or less) than 9 bits for object
format.
Not really matters, because concept remains the same:
VM could know that the only field which it can safely access by itself
is an object header. For accessing other fields (if any) it should use
functions provided in object format table.

And the last thing: why i titled topic format as a contract?

Because the next logical step of it is to imprint formats in a kind of
form of manifest, and store them in image file.
Then VM, when booting an image will read manifest and translate it
into machine code before first attempt to access any object(s) in
object memory.

So, what you thinking about it?

It could be an overkill , but i like the idea that with such approach
we could change the object format(s) dynamically at run time, without
the need of changing VM.
It also structuring the code in VM quite nicely which will serve clarity.
It also opens a wide field for experiments with different object formats.

--
Best regards,
Igor Stasenko AKA sig.

Wolfgang Eder

Re: Object format(s) as a contract with VM?

hi igor,
i have in the past pondered this question many times,
and have not yet come to satisfying conclusions.
the big question (for me) is where exactly to draw the
line. what are the assumptions that are fixed,
and what not. i mean, e.g.

what would be the interface (contract) that the garbage
collector would use?

is there a contract to the object memory itself
(e.g. allocate new memory, make an object fixed in memory,
etc)?

can we extend not only object representations, but
also compose different object memory strategies
(like, object table vs direct pointers)?

just my thoughts,
thanks
wolfgang

Igor Stasenko

Re: Object format(s) as a contract with VM?

On 29 July 2011 11:41, Wolfgang Eder <[hidden email]> wrote:

>
> hi igor,
> i have in the past pondered this question many times,
> and have not yet come to satisfying conclusions.
> the big question (for me) is where exactly to draw the
> line. what are the assumptions that are fixed,
> and what not. i mean, e.g.
>
> what would be the interface (contract) that the garbage
> collector would use?
>
> is there a contract to the object memory itself
> (e.g. allocate new memory, make an object fixed in memory,
> etc)?
>
> can we extend not only object representations, but
> also compose different object memory strategies
> (like, object table vs direct pointers)?
>

There's a plenty of various options. What is hard is to take all
tradeoffs into account.

One of extreme approaches which i thought about is to turn an object
header into pointer to a function,
of a form (selector, arguments ...), which then looks like a low-level
message send.

Then anything you can do to object will be available only through call
to this function with corresponding arguments,
including accessing object's state, sending a message, retrieving its
class , GC , etc.

Then there will be very few fixed points. The problem of course how to
apply optimizations in a presence of such extremely
late bound interface.

--
Best regards,
Igor Stasenko AKA sig.

Mariano Martinez Peck

Re: Object format(s) as a contract with VM?

In reply to this post by Igor Stasenko

On Fri, Jul 29, 2011 at 2:47 AM, Igor Stasenko <[hidden email]> wrote:

Hello,

i thought about object formats and thought how to make them more
flexible and came to following idea.

Today's squeak and cog vms using a 4 bits in object header which
denoting an object format:

So, there is 16 possible object formats supported by VM:

formatOf: oop
" 0 no fields
1 fixed fields only (all containing pointers)
2 indexable fields only (all containing pointers)
3 both fixed and indexable fields (all containing pointers)
4 both fixed and indexable weak fields (all containing pointers).

5 unused
6 indexable word fields only (no pointers)
7 indexable long (64-bit) fields (only in 64-bit images)

8-11 indexable byte fields only (no pointers) (low 2 bits are
low 2 bits of size)
12-15 compiled methods:
# of literal oops specified in method header,
followed by indexable bytes (same interpretation of
low 2 bits as above)
"

And virtual machine (mostly an ObjectMemory class) knows difference
between those numbers and how to deal with objects depending what
format they having.
The bad side of it, that most semantics around formats is hardcoded
and spreaded across various methods, like:

lastPointerOf: oop
"Return the byte offset of the last pointer field of the given object.
Works with CompiledMethods, as well as ordinary objects.
Can be used even when the type bits are not correct."
| fmt sz methodHeader header contextSize |
<inline: true>
<asmLabel: false>
header := self baseHeader: oop.
fmt := self formatOfHeader: header.
fmt <= 4 ifTrue: [(fmt = 3 and: [self isContextHeader: header])
ifTrue: ["contexts end at the stack pointer"
contextSize := self fetchStackPointerOf: oop.
^ CtxtTempFrameStart + contextSize * BytesPerWord].
sz := self sizeBitsOfSafe: oop.
^ sz - BaseHeaderSize "all pointers"].
fmt < 12 ifTrue: [^ 0]. "no pointers"

"CompiledMethod: contains both pointers and bytes:"
methodHeader := self longAt: oop + BaseHeaderSize.
^ (methodHeader >> 10 bitAnd: 255) * BytesPerWord + BaseHeaderSize

See how smelly this code is?

What i was thinking about is , that what if number which denotes an
object format instead of just being a number, point to a table of
functions, which implement
a behavior, specific for that format.

Then the above method could be turned into something as short and nice
as following:

lastPointerOf: oop
<inline: true>
<asmLabel: false>
| table |
table := (self formatTableOf: oop).
table performSomeOperation: oop param: param

which , when translated to C, will look like:

sqInt formatNumber;
formatNumber = fetchFormatOf(oop);

return ObjectFormats[formatNumber].performSomeOperation(oop, param).

So, the table is simple list of function pointers (which very closely
resembling a method dictionary in smalltalk), to provide an
implementation of certain behavior for given object format:

struct ObjectFormatFunctions {

int (*)(sqInt oop); // firstFixedSlot
int (*)(sqInt oop); // numFixedSlots:
void* (*)(sqInt oop); // lastPointerOf:
.. firstPointerOf:
.. numIndexableSlots:
.. indexableSlotSize:
...
etc
)

so, then instead of hardcoding and spreading the smelly code around
the VM, we can write something like:

(self formatTableOf: oop) performSomeOperation: oop params: params

and code no longer needs to know if format = 0 or 4 or whatever, but
just dispatching to function using formats table.

That's polymorphism in C :)

Igor: exactly that is what OpenDBX does. The thing is like this: OpenDBX has a "core". Such core is abstract, general and does not depend at all in the backend. Then, there is a openDBX-mySQL.c, openDBX-postgresql.c, etc.....each of those files are the glue/mapping between OpenDBX API and database client library. So....each of those files, implements a number of functions (there are arround 20). Imagine:

In openDBX-mysql.c:
opendbx_init -> opendbx_init_mysql
opendbx_connect -> opendbx_connect_mysql
....

In openDBX-postgreSQL.c
opendbx_init -> opendbx_init_postgresl
opendbx_connect -> opendbx_connect_postgresql
....

The OpenDBX core needs to invoke some of those functions (such us opendbx_init, opendbx_connect, etc) defined in the API between OpenDBX and database client library. So....but the core works always the same way: there is a table in memory that maps function names to real functions. So OpenDBX just searches the key in the table and invokes the functions that is in there, no matter its name :)

Then all methods which dealing with object format could be placed in
nicely organized class structure and can be translated to C in form of
tables.

The downside of it, of course, that additional level of indirection
will cause a serious slowdown to interpreter.
Because C compiler don't sees directly the implementation of operation
for certain format and must call a function, pointer to which held in
a corresponding format table.
Moreover, you cannot inline those functions, since you don't know
which one will be used at concrete place because of dynamic dispatch.

How we can deal with it?
Well, we still can hardcode the stuff (granting that we are not
modifying object format table(s) at runtime), so we can write:

fmt = 5 ifTrue: [ ^ self inlineFunctionOfFormat: 5 oop: oop params: params ]

this requires some trickery in C code generator, but it is still doable.
So, in overall i think that drawbacks could be cleverly mitigated if
not completely avoided.

So, lets talk what opportinities it could give us:
VM could support a dynamically changeable object format(s), if we
extend the number of possible formats to more than 16.
So, for first 16 numbers things will be hardcoded as today, but for
higher numbers, VM will always dispatch using format table.

The cost of indirection can be amortized quite well especially in
presence of JIT (it can simply inline the code for corresponding
semantics to access various bits in object
when compiling the methods of class with such format).

Then we could dynamically change (or create new formats) at run time.

But currently we have only 4 bits for object format. Where we can get some more?
A solution is extremely simple! We have a 5 bits in object header for
compact class index.
If we merge them with object format bits 4+5 = 9 -- 512 possible object formats!

The trick is that we still can pretty easily identify special objects
by reserving a concrete format number for them (hey we have plenty of
numbers)
So, then format numbers ranged from 0..15 will identify hardcoded
formats , known by VM from beginning.
And formats from 16..48 is for special objects.
and rest 512-48 is dynamically defined.

Of course, we could change the object header to look differently for
new VM, then there may be more (or less) than 9 bits for object
format.
Not really matters, because concept remains the same:
VM could know that the only field which it can safely access by itself
is an object header. For accessing other fields (if any) it should use
functions provided in object format table.

And the last thing: why i titled topic format as a contract?

Because the next logical step of it is to imprint formats in a kind of
form of manifest, and store them in image file.
Then VM, when booting an image will read manifest and translate it
into machine code before first attempt to access any object(s) in
object memory.

So, what you thinking about it?

It could be an overkill , but i like the idea that with such approach
we could change the object format(s) dynamically at run time, without
the need of changing VM.
It also structuring the code in VM quite nicely which will serve clarity.
It also opens a wide field for experiments with different object formats.

--
Best regards,
Igor Stasenko AKA sig.

--
Mariano
http://marianopeck.wordpress.com

Igor Stasenko

Re: Object format(s) as a contract with VM?

On 29 July 2011 22:59, Mariano Martinez Peck <[hidden email]> wrote:
>
>
> That's polymorphism in C :)
>

Yes.
The main question if its possible to introduce it without significant
speed loss.
While in OpenDBX you may do whatever you want (kind of ;)
playing with object format will have huge impact on everything.

Also, there's a question if its really worth doing like that.
An existing 16 object formats is more or less covering most of our
needs, so what other (new) object formats
may serve for.

> Igor: exactly that is what OpenDBX does.

[snip]

--
Best regards,
Igor Stasenko AKA sig.

ccrraaiigg

Re: Object format(s) as a contract with VM?

> The main question if its possible to introduce it without significant
> speed loss. While in OpenDBX you may do whatever you want (kind of ;)
> playing with object format will have huge impact on everything.
>
> Also, there's a question if its really worth doing like that.
> An existing 16 object formats is more or less covering most of our
> needs, so what other (new) object formats may serve for.

And of course, I want to use format 5 for something else. :)

-C

--
Craig Latta
www.netjam.org/resume
+31 6 2757 7177
+ 1 415 287 3547