Direct object pointers vs indirect ones pros and cons

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Direct object pointers vs indirect ones pros and cons

Igor Stasenko
 
Hi , all

RoarVM using object tables, while Squeak VM using direct pointers for object.

This is a basic element of VM design, and i wonder, how much impact it
does on overall VM speed.

Both variants having own advantages/disadvantages, while i think, that
with good JIT an extra indirection could be
almost insignificant. But having indirect pointers to objects opens
quite sexy perspectives.
Being able to freely choose object(s) location means that:
  - its quite easy to implement object memory paging (swapping between
memory and disc),
  - place particular object(s) into special memory location (good for
FFI, object pinning etc)
  - #become: is O(1), instead of O(heap size)

The downside of indirect pointers is, of course, a higher memory
traffic, which directly impacts all operations everywhere.

What else?
I'd like to know, what you thinking about it, and why Squeak VM, in
particular, using direct object pointers?
What are this choice based on? I'd like to know. Maybe i missing something.

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Andreas.Raab
 
On 11/12/2010 12:05 AM, Igor Stasenko wrote:
> Both variants having own advantages/disadvantages, while i think, that
> with good JIT an extra indirection could be
> almost insignificant.

This ignores the cost of memory access.

> I'd like to know, what you thinking about it, and why Squeak VM, in
> particular, using direct object pointers?

Performance.

> What are this choice based on? I'd like to know. Maybe i missing something.

http://ftp.squeak.org/docs/OOPSLA.Squeak.html:

The Object Memory

The design of an object memory that is general and yet compact is not
simple. We all agreed immediately on a number of parameters, though. For
efficiency and scalability to large projects, we wanted a 32-bit address
space with direct pointers (i.e., a system in which an object reference
is just the address of that object in memory). The design had to support
all object formats of our existing Smalltalk. It must be amenable to
incremental garbage collection and compaction. Finally, it must be able
to support the "become" operation (exchange identity of two objects) to
the degree required in normal Smalltalk system operation. "... etc ..."

(also see the section on storage management)

And if in doubt drop a note to dan ingalls at sap dot com and you'll get
the answer straight from the source :-)

Cheers,
   - Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko
 
Sorry, Andreas. Maybe i wasn't clear: I don't want brief answers, i
need details. :)

I would like to hear your opinion on that in context: what if you
would design a VM from scratch,
and having a direct access to high-optimization compiler/jit. What
would be your choice?

I read Squeak VM design descriprion before.
Still it would be good to know , if an impact of indirect pointers can
be measured and how.


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko
 
Here a 'simulated' kind of indirect pointers.

The Wrapper class having an 'object' ivar
and in this way, we simulating indirection.


| objects wrapped t1 t2 t3 |
objects := (1 to: 1000) collect: [:i | Object new ].
wrapped := objects collect: [:each | Wrapper new object: each ].

t1 := [ 100000 timesRepeat: [ objects do:[ :each |  each yourself ] ]
] timeToRun.
t2 := [ 100000 timesRepeat: [ wrapped do:[ :each |  each object
yourself ] ] ] timeToRun.
t3 := [ 100000 timesRepeat: [ wrapped do:[ :each |  ] ] ] timeToRun.
{t1. t2. t3}

Running on Cog it gives:

 #(3241 3498 2793)

the first bench is kind-of 'measure time to access directly to objects'
the second one is 'measure indirect access'
and third is measure a loop overhead.

So, by taking this naive benchmark, we got:

(3498 - 2793) / (3241 - 2793) asFloat
1.573660714285714

so, 50% slower.

But actually this benchmarks shows a cost of extra message send rather
than impact of extra level of indirection.
Well, a message is a kind of indirection..  :)

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Stefan Marr
In reply to this post by Igor Stasenko

Hi Igor:

On 12 Nov 2010, at 10:32, Igor Stasenko wrote:

> I would like to hear your opinion on that in context: what if you
> would design a VM from scratch,
> and having a direct access to high-optimization compiler/jit. What
> would be your choice?
I don't think you will get a satisfying answer to that question.
It might be that on certain processors the caches are big enough to actually hide the overhead of an object table in such a scenario.

But, by definition, caches are always to small.

I think we have still the source code of David's version of the RoarVM lying around that does not use an object table. With 'a bit of work' it would be possible to measure that overhead for our interpreter for a single core. So, if you feel like it, I could give you a hand here and there ;)

Best regards
Stefan


--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525

Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko

On 12 November 2010 11:59, Stefan Marr <[hidden email]> wrote:

>
> Hi Igor:
>
> On 12 Nov 2010, at 10:32, Igor Stasenko wrote:
>
>> I would like to hear your opinion on that in context: what if you
>> would design a VM from scratch,
>> and having a direct access to high-optimization compiler/jit. What
>> would be your choice?
> I don't think you will get a satisfying answer to that question.
> It might be that on certain processors the caches are big enough to actually hide the overhead of an object table in such a scenario.
>
> But, by definition, caches are always to small.
>
> I think we have still the source code of David's version of the RoarVM lying around that does not use an object table. With 'a bit of work' it would be possible to measure that overhead for our interpreter for a single core. So, if you feel like it, I could give you a hand here and there ;)
>
well, if that's not too much work to run a simple benchmarks :)

> Best regards
> Stefan
>
>
> --
> Stefan Marr
> Software Languages Lab
> Vrije Universiteit Brussel
> Pleinlaan 2 / B-1050 Brussels / Belgium
> http://soft.vub.ac.be/~smarr
> Phone: +32 2 629 2974
> Fax:   +32 2 629 3525
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Levente Uzonyi-2
In reply to this post by Igor Stasenko
 
On Fri, 12 Nov 2010, Igor Stasenko wrote:

>
> Here a 'simulated' kind of indirect pointers.
>
> The Wrapper class having an 'object' ivar
> and in this way, we simulating indirection.
>
>
> | objects wrapped t1 t2 t3 |
> objects := (1 to: 1000) collect: [:i | Object new ].
> wrapped := objects collect: [:each | Wrapper new object: each ].
>
> t1 := [ 100000 timesRepeat: [ objects do:[ :each |  each yourself ] ]
> ] timeToRun.
> t2 := [ 100000 timesRepeat: [ wrapped do:[ :each |  each object
> yourself ] ] ] timeToRun.
> t3 := [ 100000 timesRepeat: [ wrapped do:[ :each |  ] ] ] timeToRun.
> {t1. t2. t3}
>
> Running on Cog it gives:
>
> #(3241 3498 2793)

Single measurement is probably inaccurate. This benchmark creates lots of
blocks, which means GC noise. The size of the "object table" is too small
to be realistic, this hides cache related performance hits. Why don't you
use an Array instead of Wrapper? IIRC Cog has optimization for the #at:
primitive, also Arrays are compact. Why do you send #yourself and
#timesRepeat:? You should slightly shuffle the objects to be more
realistic about cache usage.


Levente

>
> the first bench is kind-of 'measure time to access directly to objects'
> the second one is 'measure indirect access'
> and third is measure a loop overhead.
>
> So, by taking this naive benchmark, we got:
>
> (3498 - 2793) / (3241 - 2793) asFloat
> 1.573660714285714
>
> so, 50% slower.
>
> But actually this benchmarks shows a cost of extra message send rather
> than impact of extra level of indirection.
> Well, a message is a kind of indirection..  :)
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko

On 12 November 2010 15:24, Levente Uzonyi <[hidden email]> wrote:

>
> On Fri, 12 Nov 2010, Igor Stasenko wrote:
>
>>
>> Here a 'simulated' kind of indirect pointers.
>>
>> The Wrapper class having an 'object' ivar
>> and in this way, we simulating indirection.
>>
>>
>> | objects wrapped t1 t2 t3 |
>> objects := (1 to: 1000) collect: [:i | Object new ].
>> wrapped := objects collect: [:each | Wrapper new object: each ].
>>
>> t1 := [ 100000 timesRepeat: [ objects do:[ :each |  each yourself ] ]
>> ] timeToRun.
>> t2 := [ 100000 timesRepeat: [ wrapped do:[ :each |  each object
>> yourself ] ] ] timeToRun.
>> t3 := [ 100000 timesRepeat: [ wrapped do:[ :each |  ] ] ] timeToRun.
>> {t1. t2. t3}
>>
>> Running on Cog it gives:
>>
>> #(3241 3498 2793)
>
> Single measurement is probably inaccurate. This benchmark creates lots of
> blocks, which means GC noise. The size of the "object table" is too small to
> be realistic, this hides cache related performance hits. Why don't you use
> an Array instead of Wrapper? IIRC Cog has optimization for the #at:
> primitive, also Arrays are compact. Why do you send #yourself and
> #timesRepeat:? You should slightly shuffle the objects to be more realistic
> about cache usage.
>

I'm inviting you to make own version of benchmark, which could
simulate an extra level of indirection
for accessing object field(s).

actually i was trying:
 t1 := [ 100000 timesRepeat: [ objects do:[ :each |  each yourself ] ]
vs
 t2 := [ 100000 timesRepeat: [ objects do:[ :each |  each object ] ]

where #yourself there is to compensate an extra message send
(#object), so it will compare
'read nothing, return self' , and 'read ivar', which is an indirection.
But what i found that this gives no difference, and actually t2 < t1
sometimes :)

>
> Levente
>
>>
>> the first bench is kind-of 'measure time to access directly to objects'
>> the second one is 'measure indirect access'
>> and third is measure a loop overhead.
>>
>> So, by taking this naive benchmark, we got:
>>
>> (3498 - 2793) / (3241 - 2793) asFloat
>> 1.573660714285714
>>
>> so, 50% slower.
>>
>> But actually this benchmarks shows a cost of extra message send rather
>> than impact of extra level of indirection.
>> Well, a message is a kind of indirection..  :)
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Bert Freudenberg

On 12.11.2010, at 14:41, Igor Stasenko wrote:

> I'm inviting you to make own version of benchmark

I don't think this can be realistically simulated inside Squeak. But possibly you could change the macros in sqMemoryAccess.h to fake an object table access?

I just tried that. Using tinyBenchmarks, byte code performance drops to 63% and sends to 78%.

Now declaring that variable volatile might be overkill as it prevents all caching, but I couldn't quite figure out a more realistic declaration.

- Bert -

#else
# ifndef FAKE_OBJ_TABLE
# define FAKE_OBJ_TABLE
  static volatile int FakeObjTable= 0;
# define OBJTABLELOOKUP(oop) (oop + FakeObjTable)
# endif
  /* Use macros when static inline functions aren't efficient. */
# define byteAtPointer(ptr) ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))))
# define byteAtPointerput(ptr, val) ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))= (unsigned char)(val)))
# define shortAtPointer(ptr) ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))))
# define shortAtPointerput(ptr, val) ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))= (short)(val)))
# define intAtPointer(ptr) ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))))
# define intAtPointerput(ptr, val) ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))= (int)(val)))
# define longAtPointer(ptr) ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))))
# define longAtPointerput(ptr, val) ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))= (sqInt)(val)))
# define oopAtPointer(ptr) (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr)))
# define oopAtPointerput(ptr, val) (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr))= (sqInt)val)
# define pointerForOop(oop) ((char *)(sqMemoryBase + ((usqInt)(oop))))
# define oopForPointer(ptr) ((sqInt)(((char *)(ptr)) - (sqMemoryBase)))
# define byteAt(oop) byteAtPointer(pointerForOop(oop))
# define byteAtput(oop, val) byteAtPointerput(pointerForOop(oop), (val))
# define shortAt(oop) shortAtPointer(pointerForOop(oop))
# define shortAtput(oop, val) shortAtPointerput(pointerForOop(oop), (val))
# define longAt(oop) longAtPointer(pointerForOop(oop))
# define longAtput(oop, val) longAtPointerput(pointerForOop(oop), (val))
# define intAt(oop) intAtPointer(pointerForOop(oop))
# define intAtput(oop, val) intAtPointerput(pointerForOop(oop), (val))
# define oopAt(oop) oopAtPointer(pointerForOop(oop))
# define oopAtput(oop, val) oopAtPointerput(pointerForOop(oop), (val))
#endif

Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Jecel Assumpcao Jr
In reply to this post by Igor Stasenko
 
Igor,

those of us who design our own hardware have options that are not
available when using conventional processors. In the case of object
tables, we can use virtually addressed object caches (invented in the
Mushroom project - http://www.wolczko.com/mushroom/index.html) to
eliminate most of the cost.

In a conventional processor, think about what happens when we execute an
instruction like

load R3, R7, R1

where R1 has the number of the instance variable we want to read
(multiplied by the word size, depending on the processor), R7 is the oop
for the object and R3 will store the value of the instance variable. The
first step is that R7 and R1 are added and the result is the virtual
address of the instance variable. Then the top (20 or so) bits will be
searched in the TLB (translation look-aside buffer) of the MMU (memory
management unit) and, if found there, they will be replaced with the
associated bits, forming the physical address of the instance variable.
The last step is that the top bits of the physical address (28 bits in
the case of a cache with lines of 16 bytes) are used to find the right
line in the data cache and the bottom bits will select the bytes from
that line to be loaded into R3.

Of course, sometimes the "page" isn't in the TLB or the data cache
doesn't have the needed line, but let's not worry about that for now.

Imagine that we redesign our processor so that same instruction will
work like this: we concatenate R7 and R1 into a 64 bit virtual instance
variable address and use the top 60 bits to find the right line in the
data cache, and the bottom 4 bits to select the bytes from that line to
be loaded into R3. We have saved one addition and one MMU lookup at the
cost of a larger tag for the cache. An additional cost is that two
objects can't share the same cache line like they can in the
conventional processor, but that doesn't hurt much.

When we can't find the cache line we need, we have to bring in data from
the main memory. That can be done by adding R7 and R1, masking the
bottom 4 bits, doing the MMU lookup and fetching the 16 bytes from the
result. This will be compatible with the direct pointer Squeak. But we
could instead use R7 as an index into an object table, fetch the base
address, add R1 to that, mask the bottom 4 bits, do a MMU lookup (or not
- the object table itself could double as a virtual memory system) and
fetch the 16 bytes into the new cache line. Since cache misses are rare,
the extra memory access here does not impact performance very much.

Note that virtual caches are considered a bad thing in the C world
because of aliasing problems: two virtual addresses might map to the
same physical address and then you could have two copies of the same
data in the cache and no way to keep them consistent. With object
addressing, this is much easier to avoid.

-- Jecel

Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

David T. Lewis
In reply to this post by Igor Stasenko
 
On Fri, Nov 12, 2010 at 11:58:59AM +0200, Igor Stasenko wrote:
>  
> Here a 'simulated' kind of indirect pointers.
>
> The Wrapper class having an 'object' ivar
> and in this way, we simulating indirection.

If you want to measure the effects of an extra level of indirection at a
low level, you may want to try hacking the MemoryAccess slang version of
the C macros.

These implement low level memory access at the level of pointerForOop and
oopForPointer and such. On an intepreter VM they run at the same speed as
the actual C macros, to the best of my ability to measure (this is due to
the effectiveness of the Slang inliner, which works suprisingly well).

If you use class MemoryAccess as a pattern (either write your own, or modify
this one), then you might be able to do experiments like this entirely at
the C level and maybe give you a better idea of the tradeoffs.

The code is in SqS/VMMaker in package MemoryAccess. A few changes are
needing in C headers in the platforms source, which can be found here:
  http://wiki.squeak.org/squeak/uploads/6081/MemoryAccessPlatformDiffs.zip
along with some explanations:
  http://wiki.squeak.org/squeak/6081

HTH,
Dave

Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

David T. Lewis
 
On Fri, Nov 12, 2010 at 02:03:44PM -0500, David T. Lewis wrote:
>
> If you want to measure the effects of an extra level of indirection at a
> low level, you may want to try hacking the MemoryAccess slang version of
> the C macros.

Oops, sorry, I didn't notice that Bert had already done this experiment
and posted the results:

On Fri, Nov 12, 2010 at 04:44:42PM +0100, Bert Freudenberg wrote:

>
> I don't think this can be realistically simulated inside Squeak. But
> possibly you could change the macros in sqMemoryAccess.h t o fake an
> object table access?
>
> I just tried that. Using tinyBenchmarks, byte code performance drops
> to 63% and sends to 78%.
>
> Now declaring that variable volatile might be overkill as it prevents all
> caching, but I couldn't quite figure out a more realistic declaration.
>
> - Bert -
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko
In reply to this post by Bert Freudenberg

On 12 November 2010 17:44, Bert Freudenberg <[hidden email]> wrote:

>
> On 12.11.2010, at 14:41, Igor Stasenko wrote:
>
>> I'm inviting you to make own version of benchmark
>
> I don't think this can be realistically simulated inside Squeak. But possibly you could change the macros in sqMemoryAccess.h to fake an object table access?
>
> I just tried that. Using tinyBenchmarks, byte code performance drops to 63% and sends to 78%.
>
> Now declaring that variable volatile might be overkill as it prevents all caching, but I couldn't quite figure out a more realistic declaration.
>
you mean that non-volatile like:

int FakeObjTable = 0;

could be optimized away by compiler?
Well, since compiler compiles module by module (a separate C files),
if you remove 'static'
it can no longer able to optimize it to no-op, since it can't guess
what may happen to this variable in another object file,
since even if in one module there only a read-only access to it, some
other module could contain a code which modifying it.

So, i think this is the worst case performance slowdown. :)

If we take into account that to get object location you need to do
object table look only once,
and then any consequent read/write operations on object won't require
table lookup, this can be improved.
Consider, for example, that to read ivar, interpreter reads & checks
header, and only then ivar slot, so it should cost:
1 table lookup and 2 reads at object location.
instead of 2 table lookups + 2 reads at object location.

> - Bert -
>
> #else
> # ifndef FAKE_OBJ_TABLE
> # define FAKE_OBJ_TABLE
>  static volatile int FakeObjTable= 0;
> # define OBJTABLELOOKUP(oop) (oop + FakeObjTable)
> # endif
>  /* Use macros when static inline functions aren't efficient. */
> # define byteAtPointer(ptr)             ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))))
> # define byteAtPointerput(ptr, val)     ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))= (unsigned char)(val)))
> # define shortAtPointer(ptr)            ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))))
> # define shortAtPointerput(ptr, val)    ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))= (short)(val)))
> # define intAtPointer(ptr)              ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))))
> # define intAtPointerput(ptr, val)      ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))= (int)(val)))
> # define longAtPointer(ptr)             ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))))
> # define longAtPointerput(ptr, val)     ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))= (sqInt)(val)))
> # define oopAtPointer(ptr)              (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr)))
> # define oopAtPointerput(ptr, val)      (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr))= (sqInt)val)
> # define pointerForOop(oop)             ((char *)(sqMemoryBase + ((usqInt)(oop))))
> # define oopForPointer(ptr)             ((sqInt)(((char *)(ptr)) - (sqMemoryBase)))
> # define byteAt(oop)                    byteAtPointer(pointerForOop(oop))
> # define byteAtput(oop, val)            byteAtPointerput(pointerForOop(oop), (val))
> # define shortAt(oop)                   shortAtPointer(pointerForOop(oop))
> # define shortAtput(oop, val)           shortAtPointerput(pointerForOop(oop), (val))
> # define longAt(oop)                    longAtPointer(pointerForOop(oop))
> # define longAtput(oop, val)            longAtPointerput(pointerForOop(oop), (val))
> # define intAt(oop)                     intAtPointer(pointerForOop(oop))
> # define intAtput(oop, val)             intAtPointerput(pointerForOop(oop), (val))
> # define oopAt(oop)                     oopAtPointer(pointerForOop(oop))
> # define oopAtput(oop, val)             oopAtPointerput(pointerForOop(oop), (val))
> #endif
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Bert Freudenberg


On 13.11.2010, at 03:17, Igor Stasenko wrote:

>
> On 12 November 2010 17:44, Bert Freudenberg <[hidden email]> wrote:
>>
>> On 12.11.2010, at 14:41, Igor Stasenko wrote:
>>
>>> I'm inviting you to make own version of benchmark
>>
>> I don't think this can be realistically simulated inside Squeak. But possibly you could change the macros in sqMemoryAccess.h to fake an object table access?
>>
>> I just tried that. Using tinyBenchmarks, byte code performance drops to 63% and sends to 78%.
>>
>> Now declaring that variable volatile might be overkill as it prevents all caching, but I couldn't quite figure out a more realistic declaration.
>>
> you mean that non-volatile like:
>
> int FakeObjTable = 0;
>
> could be optimized away by compiler?
> Well, since compiler compiles module by module (a separate C files),
> if you remove 'static'
> it can no longer able to optimize it to no-op, since it can't guess
> what may happen to this variable in another object file,
> since even if in one module there only a read-only access to it, some
> other module could contain a code which modifying it.

Yes but that wouldn't build because the linker would complain about FakeObjTable being declared globally more than once.

The better way would be to declare it external in the header. But then you need to declare it for real in a single C file. interp.c would be a good one. You should try that :)

> So, i think this is the worst case performance slowdown. :)

I think so, too. It is significant.

> If we take into account that to get object location you need to do
> object table look only once,
> and then any consequent read/write operations on object won't require
> table lookup, this can be improved.
> Consider, for example, that to read ivar, interpreter reads & checks
> header, and only then ivar slot, so it should cost:
> 1 table lookup and 2 reads at object location.
> instead of 2 table lookups + 2 reads at object location.

But that's no different from what we have now. We only access memory if necessary. There would not be fewer lookups if you have an object table.

- Bert -

>
>> - Bert -
>>
>> #else
>> # ifndef FAKE_OBJ_TABLE
>> # define FAKE_OBJ_TABLE
>>  static volatile int FakeObjTable= 0;
>> # define OBJTABLELOOKUP(oop) (oop + FakeObjTable)
>> # endif
>>  /* Use macros when static inline functions aren't efficient. */
>> # define byteAtPointer(ptr)             ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))))
>> # define byteAtPointerput(ptr, val)     ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))= (unsigned char)(val)))
>> # define shortAtPointer(ptr)            ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))))
>> # define shortAtPointerput(ptr, val)    ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))= (short)(val)))
>> # define intAtPointer(ptr)              ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))))
>> # define intAtPointerput(ptr, val)      ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))= (int)(val)))
>> # define longAtPointer(ptr)             ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))))
>> # define longAtPointerput(ptr, val)     ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))= (sqInt)(val)))
>> # define oopAtPointer(ptr)              (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr)))
>> # define oopAtPointerput(ptr, val)      (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr))= (sqInt)val)
>> # define pointerForOop(oop)             ((char *)(sqMemoryBase + ((usqInt)(oop))))
>> # define oopForPointer(ptr)             ((sqInt)(((char *)(ptr)) - (sqMemoryBase)))
>> # define byteAt(oop)                    byteAtPointer(pointerForOop(oop))
>> # define byteAtput(oop, val)            byteAtPointerput(pointerForOop(oop), (val))
>> # define shortAt(oop)                   shortAtPointer(pointerForOop(oop))
>> # define shortAtput(oop, val)           shortAtPointerput(pointerForOop(oop), (val))
>> # define longAt(oop)                    longAtPointer(pointerForOop(oop))
>> # define longAtput(oop, val)            longAtPointerput(pointerForOop(oop), (val))
>> # define intAt(oop)                     intAtPointer(pointerForOop(oop))
>> # define intAtput(oop, val)             intAtPointerput(pointerForOop(oop), (val))
>> # define oopAt(oop)                     oopAtPointer(pointerForOop(oop))
>> # define oopAtput(oop, val)             oopAtPointerput(pointerForOop(oop), (val))
>> #endif
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.


Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Levente Uzonyi-2
In reply to this post by Igor Stasenko
 


On Sat, 13 Nov 2010, Igor Stasenko wrote:

>
> On 12 November 2010 17:44, Bert Freudenberg <[hidden email]> wrote:
>>
>> On 12.11.2010, at 14:41, Igor Stasenko wrote:
>>
>>> I'm inviting you to make own version of benchmark
>>
>> I don't think this can be realistically simulated inside Squeak. But possibly you could change the macros in sqMemoryAccess.h to fake an object table access?
>>
>> I just tried that. Using tinyBenchmarks, byte code performance drops to 63% and sends to 78%.
>>
>> Now declaring that variable volatile might be overkill as it prevents all caching, but I couldn't quite figure out a more realistic declaration.
>>
> you mean that non-volatile like:
>
> int FakeObjTable = 0;
>
> could be optimized away by compiler?
> Well, since compiler compiles module by module (a separate C files),
> if you remove 'static'
> it can no longer able to optimize it to no-op, since it can't guess
> what may happen to this variable in another object file,
> since even if in one module there only a read-only access to it, some
> other module could contain a code which modifying it.
>
> So, i think this is the worst case performance slowdown. :)
>
> If we take into account that to get object location you need to do
> object table look only once,
> and then any consequent read/write operations on object won't require
> table lookup, this can be improved.
You can't do that if you want O(1) time for #become:.


Levente

> Consider, for example, that to read ivar, interpreter reads & checks
> header, and only then ivar slot, so it should cost:
> 1 table lookup and 2 reads at object location.
> instead of 2 table lookups + 2 reads at object location.
>
>> - Bert -
>>
>> #else
>> # ifndef FAKE_OBJ_TABLE
>> # define FAKE_OBJ_TABLE
>>  static volatile int FakeObjTable= 0;
>> # define OBJTABLELOOKUP(oop) (oop + FakeObjTable)
>> # endif
>>  /* Use macros when static inline functions aren't efficient. */
>> # define byteAtPointer(ptr)             ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))))
>> # define byteAtPointerput(ptr, val)     ((sqInt)(*((unsigned char *)(OBJTABLELOOKUP(ptr)))= (unsigned char)(val)))
>> # define shortAtPointer(ptr)            ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))))
>> # define shortAtPointerput(ptr, val)    ((sqInt)(*((short *)(OBJTABLELOOKUP(ptr)))= (short)(val)))
>> # define intAtPointer(ptr)              ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))))
>> # define intAtPointerput(ptr, val)      ((sqInt)(*((unsigned int *)(OBJTABLELOOKUP(ptr)))= (int)(val)))
>> # define longAtPointer(ptr)             ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))))
>> # define longAtPointerput(ptr, val)     ((sqInt)(*((sqInt *)(OBJTABLELOOKUP(ptr)))= (sqInt)(val)))
>> # define oopAtPointer(ptr)              (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr)))
>> # define oopAtPointerput(ptr, val)      (sqInt)(*((sqInt *)OBJTABLELOOKUP(ptr))= (sqInt)val)
>> # define pointerForOop(oop)             ((char *)(sqMemoryBase + ((usqInt)(oop))))
>> # define oopForPointer(ptr)             ((sqInt)(((char *)(ptr)) - (sqMemoryBase)))
>> # define byteAt(oop)                    byteAtPointer(pointerForOop(oop))
>> # define byteAtput(oop, val)            byteAtPointerput(pointerForOop(oop), (val))
>> # define shortAt(oop)                   shortAtPointer(pointerForOop(oop))
>> # define shortAtput(oop, val)           shortAtPointerput(pointerForOop(oop), (val))
>> # define longAt(oop)                    longAtPointer(pointerForOop(oop))
>> # define longAtput(oop, val)            longAtPointerput(pointerForOop(oop), (val))
>> # define intAt(oop)                     intAtPointer(pointerForOop(oop))
>> # define intAtput(oop, val)             intAtPointerput(pointerForOop(oop), (val))
>> # define oopAt(oop)                     oopAtPointer(pointerForOop(oop))
>> # define oopAtput(oop, val)             oopAtPointerput(pointerForOop(oop), (val))
>> #endif
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko
 
2010/11/13 Levente Uzonyi <[hidden email]>:

>
>
>
> On Sat, 13 Nov 2010, Igor Stasenko wrote:
>
>>
>> On 12 November 2010 17:44, Bert Freudenberg <[hidden email]> wrote:
>>>
>>> On 12.11.2010, at 14:41, Igor Stasenko wrote:
>>>
>>>> I'm inviting you to make own version of benchmark
>>>
>>> I don't think this can be realistically simulated inside Squeak. But possibly you could change the macros in sqMemoryAccess.h to fake an object table access?
>>>
>>> I just tried that. Using tinyBenchmarks, byte code performance drops to 63% and sends to 78%.
>>>
>>> Now declaring that variable volatile might be overkill as it prevents all caching, but I couldn't quite figure out a more realistic declaration.
>>>
>> you mean that non-volatile like:
>>
>> int FakeObjTable = 0;
>>
>> could be optimized away by compiler?
>> Well, since compiler compiles module by module (a separate C files),
>> if you remove 'static'
>> it can no longer able to optimize it to no-op, since it can't guess
>> what may happen to this variable in another object file,
>> since even if in one module there only a read-only access to it, some
>> other module could contain a code which modifying it.
>>
>> So, i think this is the worst case performance slowdown. :)
>>
>> If we take into account that to get object location you need to do
>> object table look only once,
>> and then any consequent read/write operations on object won't require
>> table lookup, this can be improved.
>
> You can't do that if you want O(1) time for #become:.
>
Can you elaborate?
For instance, lets take a bytecode read. Should each bytecode read
also go through object table?

>
> Levente
>
>> Consider, for example, that to read ivar, interpreter reads & checks
>> header, and only then ivar slot, so it should cost:
>> 1 table lookup and 2 reads at object location.
>> instead of 2 table lookups + 2 reads at object location.
>>
>>> - Bert -
>>>




--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

ungar
In reply to this post by Igor Stasenko
 
LOL!

Back in 1981 or -2 I built the first Smalltalk system WITHOUT an object table.
It used 32-bit direct pointers and Generation Scavenging (which I "invented").
First Smalltalk VM with direct pointers, first with generational GC, first with 32-bit OOPS.
It was called "Berkeley Smalltalk" or BS.
Peter (Deutsch) bet me a dinner about how much eliminating the OT would speed things up, and
when I surpassed Peter's estimate (I think it was 1.4), I collected one of the best dinners I have ever had.
Soon after, PS (Deustch & Schiffman) ran rings around BS, but that's another story.

- David



On Nov 12, 2010, at 2:56 AM, Igor Stasenko wrote:

>
> On 12 November 2010 11:59, Stefan Marr <[hidden email]> wrote:
>>
>> Hi Igor:
>>
>> On 12 Nov 2010, at 10:32, Igor Stasenko wrote:
>>
>>> I would like to hear your opinion on that in context: what if you
>>> would design a VM from scratch,
>>> and having a direct access to high-optimization compiler/jit. What
>>> would be your choice?
>> I don't think you will get a satisfying answer to that question.
>> It might be that on certain processors the caches are big enough to actually hide the overhead of an object table in such a scenario.
>>
>> But, by definition, caches are always to small.
>>
>> I think we have still the source code of David's version of the RoarVM lying around that does not use an object table. With 'a bit of work' it would be possible to measure that overhead for our interpreter for a single core. So, if you feel like it, I could give you a hand here and there ;)
>>
> well, if that's not too much work to run a simple benchmarks :)
>
>> Best regards
>> Stefan
>>
>>
>> --
>> Stefan Marr
>> Software Languages Lab
>> Vrije Universiteit Brussel
>> Pleinlaan 2 / B-1050 Brussels / Belgium
>> http://soft.vub.ac.be/~smarr
>> Phone: +32 2 629 2974
>> Fax:   +32 2 629 3525
>>
>>
>
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Javier Burroni
In reply to this post by Igor Stasenko
Igor Stasenko wrote
the first bench is kind-of 'measure time to access directly to objects'
the second one is 'measure indirect access'
and third is measure a loop overhead.
Hi there,
I've just arrived to this thread (thanks to Mariano), and I wanted to share some speculations:
Having JIT'ed code with self (the oop of the actual object) in a register, and selfID (the id of self in the object table) in a second register.
We have:
accessing ivar: no extra cost
method lookup:
one extra indirection
sends with MonomorphicInlineCache:
no extra cost if implemented in an instance basis (checking against selfID). One indirection otherwise

GC (MarkAndCompact):
Faster (due to the removal of the threading process).

saludos
jb
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Igor Stasenko
 
On 26 October 2011 23:11, Javier Burroni <[hidden email]> wrote:

>
>
> Igor Stasenko wrote:
>>
>> the first bench is kind-of 'measure time to access directly to objects'
>> the second one is 'measure indirect access'
>> and third is measure a loop overhead.
>>
>>
>
> Hi there,
> I've just arrived to this thread (thanks to Mariano), and I wanted to share
> some speculations:
> Having JIT'ed code with self (the oop of the actual object) in a register,
> and selfID (the id of self in the object table) in a second register.

yes, but then i will ask you to compare results with JIT optimized for
direct pointers.. :)

> We have:
> accessing ivar: no extra cost
> method lookup:
> one extra indirection
> sends with MonomorphicInlineCache:
> no extra cost if implemented in an instance basis (checking against selfID).

hmm.. that doesn't makes inline cache to be effective.
usually many different objects are passing via single call site but
they having same class, this is where monomophic IC shines.
if you change the cache to work on per-instance basis, i think it will
make it less effective because of too much misses.

> One indirection otherwise
>
> GC (MarkAndCompact):
> Faster (due to the removal of the threading process).
>
yes, GC is faster because you don't need to rewrite pointers in each
object, with object table, when you moving object(s)
you only need to change the pointer in object table and you done.

> saludos
> jb
>
>
> --
> View this message in context: http://forum.world.st/Direct-object-pointers-vs-indirect-ones-pros-and-cons-tp3039203p3942123.html
> Sent from the Squeak VM mailing list archive at Nabble.com.
>



--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: Direct object pointers vs indirect ones pros and cons

Javier Burroni
 

>
> yes, but then i will ask you to compare results with JIT optimized for
> direct pointers.. :)
>
>> We have:
>> accessing ivar: no extra cost
>> method lookup:
>> one extra indirection
>> sends with MonomorphicInlineCache:
>> no extra cost if implemented in an instance basis (checking against selfID).
>
> hmm.. that doesn't makes inline cache to be effective.
> usually many different objects are passing via single call site but
> they having same class, this is where monomophic IC shines.
> if you change the cache to work on per-instance basis, i think it will
> make it less effective because of too much misses.
but you can have the two of them.
In the jited prologue you may have something like:

mov [objectTable + selfID], self
cmp   [self -4], nativizedClass
jz     endOfPrologue   // patching code must be added here
jmp looupAndJIT
cmp selfID, nativizedSelfID        <- entry point
jnz cmpClass
mov nativizedSelf, self
endOfPrologue


you add (mainly) an extra memory access, if the branch predictor helps

--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )
12