WideString hash is way slower than ByteString hash.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: WideString hash is way slower than ByteString hash.

Igor Stasenko
On 15 May 2010 02:43, Igor Stasenko <[hidden email]> wrote:

> On 15 May 2010 02:35, Eliot Miranda <[hidden email]> wrote:
>> The cardinal rule of running benchmarks is to compare apples to apples.
>>  You've compared apples to oranges, i.e. an optimized reimplementation of
>> WideString>>hash that eliminates the mapping of codes to characters, against
>> the vanilla Squeak implementation.  You need to at least compare the NB
>> implementation against
>> WideString methods for comparison
>> fastHash
>> | stringSize hash low |
>> stringSize := self size.
>> hash := ByteString identityHash bitAnd: 16rFFFFFFF.
>> 1 to: stringSize do: [:pos |
>> hash := hash + (self wordAt: pos).
>> "Begin hashMultiply"
>> low := hash bitAnd: 16383.
>> hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low)
>> bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF.
>> ].
>> ^ hash
>> | s n |
>> s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
>> n := 100000.
>> { [1 to: n do: [:i| s fastHash. s fastHash. s fastHash. s fastHash. s
>> fastHash. s fastHash. s fastHash. s fastHash. s fastHash. s fastHash]]
>> timeToRun.
>>  [1 to: n do: [:i| s hash. s hash. s hash. s hash. s hash. s hash. s hash. s
>> hash. s hash. s hash]] timeToRun. }
>>      #(829 1254)
>> ASo your measurements tell us nothing about a general comparison of NB
>> against the Squeak VM or Cog.  They only demonstrate (unsurprisingly) that a
>> loop summing integers in an array goes PDQ.  On the other hand my numbers
>> showed Cog 10x faster than the Squeak interpreter when executing exactly the
>> same bytecode.
>
> Yes, of course you're right.
> But i didn't compared it with Cog, because i can't.
> And NB is not for translating bytecodes into native code,
> it is for authoring a primitives using native code.
> So, my comparison is how faster it could be , if implemented primitively.
>
> I can only imagine, how faster the whole thing would be if we
> cross-hatch NB and Cog,
> where Cog will serve as a smalltalk optimizer , and NB will serve for
> manual optimizations
> for heavy numerical crunching :)
>

Its now time to verify things, since now i can run native code under Cog:

|ss s t1 t2 t3 |
s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
ss := (String with: $z) , 'abcdefghijklmno'.

10 timesRepeat: [ s := s , s. ss := ss , ss ].
self assert: (s hash = s nbHash).
t1 := [1000 timesRepeat: [s hash]] timeToRun.
t2 := [1000 timesRepeat: [s nbHash]] timeToRun.
t3 := [1000 timesRepeat: [ss hash]] timeToRun.
{ t1. t2. t3 }

Squeak VM:

 #(7929 89 125)

Cog VM:

 #(1256 88 99)


However, in tight loops, where loop's payload comparable to function's
payload itself, Cog is a winner:

|s|
s := WideString with: (Character value: 16r55E4).
[10000000 timesRepeat: [s hash]] timeToRun.
1307

|s|
s := WideString with: (Character value: 16r55E4).
[ s hash ] bench
'2,840,000 per second.'

|s|
s := WideString with: (Character value: 16r55E4).
[10000000 timesRepeat: [s nbHash]] timeToRun.

1438

|s|
s := WideString with: (Character value: 16r55E4).
[ s nbHash ] bench
 '2,710,000 per second.'

obviously, because it can't inline a native code generated by NB :)

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: WideString hash is way slower than ByteString hash.

Levente Uzonyi-2
On Thu, 30 Sep 2010, Igor Stasenko wrote:

> On 15 May 2010 02:43, Igor Stasenko <[hidden email]> wrote:
>> On 15 May 2010 02:35, Eliot Miranda <[hidden email]> wrote:
>>> The cardinal rule of running benchmarks is to compare apples to apples.
>>>  You've compared apples to oranges, i.e. an optimized reimplementation of
>>> WideString>>hash that eliminates the mapping of codes to characters, against
>>> the vanilla Squeak implementation.  You need to at least compare the NB
>>> implementation against
>>> WideString methods for comparison
>>> fastHash
>>> | stringSize hash low |
>>> stringSize := self size.
>>> hash := ByteString identityHash bitAnd: 16rFFFFFFF.
>>> 1 to: stringSize do: [:pos |
>>> hash := hash + (self wordAt: pos).
>>> "Begin hashMultiply"
>>> low := hash bitAnd: 16383.
>>> hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low)
>>> bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF.
>>> ].
>>> ^ hash
>>> | s n |
>>> s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
>>> n := 100000.
>>> { [1 to: n do: [:i| s fastHash. s fastHash. s fastHash. s fastHash. s
>>> fastHash. s fastHash. s fastHash. s fastHash. s fastHash. s fastHash]]
>>> timeToRun.
>>>  [1 to: n do: [:i| s hash. s hash. s hash. s hash. s hash. s hash. s hash. s
>>> hash. s hash. s hash]] timeToRun. }
>>>      #(829 1254)
>>> ASo your measurements tell us nothing about a general comparison of NB
>>> against the Squeak VM or Cog.  They only demonstrate (unsurprisingly) that a
>>> loop summing integers in an array goes PDQ.  On the other hand my numbers
>>> showed Cog 10x faster than the Squeak interpreter when executing exactly the
>>> same bytecode.
>>
>> Yes, of course you're right.
>> But i didn't compared it with Cog, because i can't.
>> And NB is not for translating bytecodes into native code,
>> it is for authoring a primitives using native code.
>> So, my comparison is how faster it could be , if implemented primitively.
>>
>> I can only imagine, how faster the whole thing would be if we
>> cross-hatch NB and Cog,
>> where Cog will serve as a smalltalk optimizer , and NB will serve for
>> manual optimizations
>> for heavy numerical crunching :)
>>
>
> Its now time to verify things, since now i can run native code under Cog:
>
> |ss s t1 t2 t3 |
> s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
> ss := (String with: $z) , 'abcdefghijklmno'.
>
> 10 timesRepeat: [ s := s , s. ss := ss , ss ].
> self assert: (s hash = s nbHash).
> t1 := [1000 timesRepeat: [s hash]] timeToRun.
> t2 := [1000 timesRepeat: [s nbHash]] timeToRun.
> t3 := [1000 timesRepeat: [ss hash]] timeToRun.
> { t1. t2. t3 }
>
> Squeak VM:
>
> #(7929 89 125)
You should update your image, because WideString's hash is now ~2x faster
than before with SqueakVM. I get #(3822 119) for t1 and t3.

>
> Cog VM:
>
> #(1256 88 99)
>
>
> However, in tight loops, where loop's payload comparable to function's
> payload itself, Cog is a winner:
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [10000000 timesRepeat: [s hash]] timeToRun.
> 1307
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [ s hash ] bench
> '2,840,000 per second.'
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [10000000 timesRepeat: [s nbHash]] timeToRun.
>
> 1438
>
> |s|
> s := WideString with: (Character value: 16r55E4).
> [ s nbHash ] bench
> '2,710,000 per second.'
>
> obviously, because it can't inline a native code generated by NB :)
Can Cog inline message sends? I thought it can't ATM.
Btw, can you send a message from NB code?


Levente

>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: WideString hash is way slower than ByteString hash.

Eliot Miranda-2


2010/9/29 Levente Uzonyi <[hidden email]>
On Thu, 30 Sep 2010, Igor Stasenko wrote:

On 15 May 2010 02:43, Igor Stasenko <[hidden email]> wrote:
On 15 May 2010 02:35, Eliot Miranda <[hidden email]> wrote:
The cardinal rule of running benchmarks is to compare apples to apples.
 You've compared apples to oranges, i.e. an optimized reimplementation of
WideString>>hash that eliminates the mapping of codes to characters, against
the vanilla Squeak implementation.  You need to at least compare the NB
implementation against
WideString methods for comparison
fastHash
| stringSize hash low |
stringSize := self size.
hash := ByteString identityHash bitAnd: 16rFFFFFFF.
1 to: stringSize do: [:pos |
hash := hash + (self wordAt: pos).
"Begin hashMultiply"
low := hash bitAnd: 16383.
hash := (16r260D * low + ((16r260D * (hash bitShift: -14) + (16r0065 * low)
bitAnd: 16383) * 16384)) bitAnd: 16r0FFFFFFF.
].
^ hash
| s n |
s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
n := 100000.
{ [1 to: n do: [:i| s fastHash. s fastHash. s fastHash. s fastHash. s
fastHash. s fastHash. s fastHash. s fastHash. s fastHash. s fastHash]]
timeToRun.
 [1 to: n do: [:i| s hash. s hash. s hash. s hash. s hash. s hash. s hash. s
hash. s hash. s hash]] timeToRun. }
     #(829 1254)
ASo your measurements tell us nothing about a general comparison of NB
against the Squeak VM or Cog.  They only demonstrate (unsurprisingly) that a
loop summing integers in an array goes PDQ.  On the other hand my numbers
showed Cog 10x faster than the Squeak interpreter when executing exactly the
same bytecode.

Yes, of course you're right.
But i didn't compared it with Cog, because i can't.
And NB is not for translating bytecodes into native code,
it is for authoring a primitives using native code.
So, my comparison is how faster it could be , if implemented primitively.

I can only imagine, how faster the whole thing would be if we
cross-hatch NB and Cog,
where Cog will serve as a smalltalk optimizer , and NB will serve for
manual optimizations
for heavy numerical crunching :)


Its now time to verify things, since now i can run native code under Cog:

|ss s t1 t2 t3 |
s := (WideString with: (Character value: 16r55E4)) , 'abcdefghijklmno'.
ss := (String with: $z) , 'abcdefghijklmno'.

10 timesRepeat: [ s := s , s. ss := ss , ss ].
self assert: (s hash = s nbHash).
t1 := [1000 timesRepeat: [s hash]] timeToRun.
t2 := [1000 timesRepeat: [s nbHash]] timeToRun.
t3 := [1000 timesRepeat: [ss hash]] timeToRun.
{ t1. t2. t3 }

Squeak VM:

#(7929 89 125)

You should update your image, because WideString's hash is now ~2x faster than before with SqueakVM. I get #(3822 119) for t1 and t3.



Cog VM:

#(1256 88 99)


However, in tight loops, where loop's payload comparable to function's
payload itself, Cog is a winner:

|s|
s := WideString with: (Character value: 16r55E4).
[10000000 timesRepeat: [s hash]] timeToRun.
1307

|s|
s := WideString with: (Character value: 16r55E4).
[ s hash ] bench
'2,840,000 per second.'

|s|
s := WideString with: (Character value: 16r55E4).
[10000000 timesRepeat: [s nbHash]] timeToRun.

1438

|s|
s := WideString with: (Character value: 16r55E4).
[ s nbHash ] bench
'2,710,000 per second.'

obviously, because it can't inline a native code generated by NB :)

Can Cog inline message sends? I thought it can't ATM.

Right.  But the thought is that because Cog does inline machine-code for certain primitives (actually it generates machine-code versions of core primitives) that with a little work NB would be able to generate the machine code for primitives that Cog's code generator doesn't handle.  

Igor will correct me if I'm wrong but I think the idea is thus.  NB-generated code needs to be stored somewhere in a CompiledMethod where Cog can find it, along with some link-edit metadata to allow Cog to fix-up data references in the code etc.  If Cog finds a method containing NB code it copies this code into the start of the method (where primitive code is compiled) and does some simple link-editing on the code.  Now the Cog method starts with NB code which operates ;like a primitive, jumping to the start of the Cog method proper if it fails.

If we're talking about NB FFI calls then there's additional compilations.  A stack switch is required from Cog's stack pages to the real C stack, and back on return, etc.  But I'm encouraged that Igor has already got NB going on Cog.

Btw, can you send a message from NB code?

As I understand it, no.

best
Eliot
 


Levente



--
Best regards,
Igor Stasenko AKA sig.







Reply | Threaded
Open this post in threaded view
|

Re: WideString hash is way slower than ByteString hash.

Igor Stasenko
On 30 September 2010 03:39, Eliot Miranda <[hidden email]> wrote:
>>
>> Can Cog inline message sends? I thought it can't ATM.
>
> Right.  But the thought is that because Cog does inline machine-code for
> certain primitives (actually it generates machine-code versions of core
> primitives) that with a little work NB would be able to generate the machine
> code for primitives that Cog's code generator doesn't handle.
> Igor will correct me if I'm wrong but I think the idea is thus.

You are right here.


>  NB-generated code needs to be stored somewhere in a CompiledMethod where
> Cog can find it,

This is trivial. Its stored right after last bytecode in compiled method :)

> along with some link-edit metadata to allow Cog to fix-up
> data references in the code etc.

No need. The generated code is position agnostic, and its using
interpreterProxy api for
speaking with VM. No position/object memory dependent data is stored there.
You can copy it anywhere you want and it will still work correctly.
Native code during its interaction with VM fully relies on
interpreterProxy interface (as any other primitive).

A native code is a function of kind:

sqInt nativeFunction();

if its not fails, it should return a valid oop.
So, after call you simply check if prim not failed:

        result := self fnPtr. "fnPtr is native code"
        self failed ifFalse: [
                interpreterProxy pop: argCount + 1 thenPush: result.
                ] ifTrue: [ lastError := interpreterProxy primitiveFailureCode].

but of course, for inlined code we can change that , in order to
generate even more optimal code :)

> If Cog finds a method containing NB code
> it copies this code into the start of the method (where primitive code is
> compiled) and does some simple link-editing on the code.  Now the Cog method
> starts with NB code which operates ;like a primitive, jumping to the start
> of the Cog method proper if it fails.

A sketch idea is following:
 - reserve a special primitive number for 'a compiled method with
native code, which can be inlined'.
Then you can easily detect in Cog inliner, if method can be inlined
after some extra checks:
Method could have such primitive, but may not have a native code,
since its not yet generated. Then you simply 'fail prim'
- i.e. simply go and run method interpreted.
This is what i am actually checking/doing each time before making a native call.

Once everything right (it finds a correct native code within a
method), its then free to either make a call to it or inline it.

And as i understood, from image side, we have to guarantee that such
special inlined code
won't overflow the stack , so Cog won't need to switch stack pointer
to C stack before running it.

Or, if you extend interpreterProxy with something like
pleaseMakeSureIAmUsingCStack() and
pleaseMakeSureIAmUsingCogStackAgain()
then NB could generate a code which will call these things
before/after external calls,
and so, Cog are free to inline even code which may contain external calls.

> If we're talking about NB FFI calls then there's additional compilations.  A
> stack switch is required from Cog's stack pages to the real C stack, and
> back on return, etc.  But I'm encouraged that Igor has already got NB going
> on Cog.

It wasn't hard.
I have to match interpreterProxy interface with Cog's one. (Btw i am
really miss the getStackPointer(),
which currently not implemented in Cog VM, i would be happy to have
this function available without Alien plugin :)
Second, i changed two methods which were responsible for marchalling
Float instances, since Cog
storing them in native format, while Squeak in big endian.

>>
>> Btw, can you send a message from NB code?
>
> As I understand it, no.

Of course it can't, because native code runs as a primitive.
Primitive can't do sends, because then obviously, it has to enter
interpreter loop ( via call to interpret()),
and once send complete, somehow, it should again continue to run native code.
So, its implementation will be much more like a callback mechanism,
which already there.

With NB you can create a native function with callback, then you
making an FFI callout to that function
and when it calls back, you can do 'message send from NB code' or
whatever you want.
So, roughly speaking it could be seen as a way to do a message sends
from native code :)

> best
> Eliot
>
>>
>> Levente
>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>
>>
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: WideString hash is way slower than ByteString hash.

Eliot Miranda-2


On Wed, Sep 29, 2010 at 6:38 PM, Igor Stasenko <[hidden email]> wrote:
On 30 September 2010 03:39, Eliot Miranda <[hidden email]> wrote:
>>
>> Can Cog inline message sends? I thought it can't ATM.
>
> Right.  But the thought is that because Cog does inline machine-code for
> certain primitives (actually it generates machine-code versions of core
> primitives) that with a little work NB would be able to generate the machine
> code for primitives that Cog's code generator doesn't handle.
> Igor will correct me if I'm wrong but I think the idea is thus.

You are right here.


>  NB-generated code needs to be stored somewhere in a CompiledMethod where
> Cog can find it,

This is trivial. Its stored right after last bytecode in compiled method :)

No, it's not trivial.  Cog doesn't know how to distinguish an NB trailer form any old trailer.  personally I'd prefer to see the NB code in a special location in the literal frame (e.g. the first literal) using an instance of a class that can't be a literal.  Either that, or a bit in the CompiledMethod header should indicate that the trailer contains the code.  (Oh, but I see below you're thinking of a specific primitive number; that's good).  Also, if the NB code /is/ in the trailer, how does Cog know where it ends, sicne presumably the trailer includes other stuff after the trailer too.


> along with some link-edit metadata to allow Cog to fix-up
> data references in the code etc.

No need. The generated code is position agnostic, and its using
interpreterProxy api for
speaking with VM. No position/object memory dependent data is stored there.
You can copy it anywhere you want and it will still work correctly.
Native code during its interaction with VM fully relies on
interpreterProxy interface (as any other primitive).

So the NB code uses absolute addressing to get hold of the interpreterProxy, right?  Depending on what the primitive is doing using interpreterProxy may be OK or may have poor performance.  A little bit of metadata woudl allow NB direct access to certain variables and allow it to short-circuit the slow double-indirect calls through interpreterProxy.


A native code is a function of kind:

sqInt nativeFunction();

if its not fails, it should return a valid oop.
So, after call you simply check if prim not failed:

       result := self fnPtr. "fnPtr is native code"
       self failed ifFalse: [
               interpreterProxy pop: argCount + 1 thenPush: result.
               ] ifTrue: [ lastError := interpreterProxy primitiveFailureCode].

but of course, for inlined code we can change that , in order to
generate even more optimal code :)

> If Cog finds a method containing NB code
> it copies this code into the start of the method (where primitive code is
> compiled) and does some simple link-editing on the code.  Now the Cog method
> starts with NB code which operates ;like a primitive, jumping to the start
> of the Cog method proper if it fails.

A sketch idea is following:
 - reserve a special primitive number for 'a compiled method with
native code, which can be inlined'.

Yes.  That's nice.
 
Then you can easily detect in Cog inliner, if method can be inlined
after some extra checks:
Method could have such primitive, but may not have a native code,
since its not yet generated. Then you simply 'fail prim'
- i.e. simply go and run method interpreted.
This is what i am actually checking/doing each time before making a native call.

Once everything right (it finds a correct native code within a
method), its then free to either make a call to it or inline it.

And as i understood, from image side, we have to guarantee that such
special inlined code
won't overflow the stack , so Cog won't need to switch stack pointer
to C stack before running it.

Right.  But Cog stack pages have a generous headroom so you can count on 256 bytes of headroom.  But we'd have to specify this.  Hopefully 256 bytes is enough for something that runs on the Smalltalk stack since one could argue that if it uses lots of stack its probably going to run for a whole and so the overhead of switching to/from teh C stack is affordable.  i.e. run only small performance-critical things on the Smalltalk stack.
 

Or, if you extend interpreterProxy with something like
pleaseMakeSureIAmUsingCStack() and
pleaseMakeSureIAmUsingCogStackAgain()
then NB could generate a code which will call these things
before/after external calls,
and so, Cog are free to inline even code which may contain external calls.

How about a primitive that says "run this code on the C stack" and one that says "run this on the native stack"?
 

> If we're talking about NB FFI calls then there's additional compilations.  A
> stack switch is required from Cog's stack pages to the real C stack, and
> back on return, etc.  But I'm encouraged that Igor has already got NB going
> on Cog.

It wasn't hard.
I have to match interpreterProxy interface with Cog's one. (Btw i am
really miss the getStackPointer(),
which currently not implemented in Cog VM, i would be happy to have
this function available without Alien plugin :)

I think thats very easy to add :)
 
Second, i changed two methods which were responsible for marchalling
Float instances, since Cog
storing them in native format, while Squeak in big endian.

>>
>> Btw, can you send a message from NB code?
>
> As I understand it, no.

Of course it can't, because native code runs as a primitive.
Primitive can't do sends, because then obviously, it has to enter
interpreter loop ( via call to interpret()),
and once send complete, somehow, it should again continue to run native code.
So, its implementation will be much more like a callback mechanism,
which already there.

With NB you can create a native function with callback, then you
making an FFI callout to that function
and when it calls back, you can do 'message send from NB code' or
whatever you want.
So, roughly speaking it could be seen as a way to do a message sends
from native code :)

cheers
Eliot
 

> best
> Eliot
>
>>
>> Levente
>>
>>>
>>> --
>>> Best regards,
>>> Igor Stasenko AKA sig.
>>>
>>
>>
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.




Reply | Threaded
Open this post in threaded view
|

Re: WideString hash is way slower than ByteString hash.

Igor Stasenko
On 30 September 2010 04:53, Eliot Miranda <[hidden email]> wrote:

>
>
> On Wed, Sep 29, 2010 at 6:38 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 30 September 2010 03:39, Eliot Miranda <[hidden email]> wrote:
>> >>
>> >> Can Cog inline message sends? I thought it can't ATM.
>> >
>> > Right.  But the thought is that because Cog does inline machine-code for
>> > certain primitives (actually it generates machine-code versions of core
>> > primitives) that with a little work NB would be able to generate the
>> > machine
>> > code for primitives that Cog's code generator doesn't handle.
>> > Igor will correct me if I'm wrong but I think the idea is thus.
>>
>> You are right here.
>>
>>
>> >  NB-generated code needs to be stored somewhere in a CompiledMethod
>> > where
>> > Cog can find it,
>>
>> This is trivial. Its stored right after last bytecode in compiled method
>> :)
>
> No, it's not trivial.  Cog doesn't know how to distinguish an NB trailer
> form any old trailer.

Obviously NB plugin knows how to do that, so does Cog since C plugin
is part of VM :)

> Personally I'd prefer to see the NB code in a special
> location in the literal frame (e.g. the first literal) using an instance of
> a class that can't be a literal.  Either that, or a bit in the
> CompiledMethod header should indicate that the trailer contains the code.
>  (Oh, but I see below you're thinking of a specific primitive number; that's
> good).  Also, if the NB code /is/ in the trailer, how does Cog know where it
> ends, sicne presumably the trailer includes other stuff after the trailer
> too.

I did the simplest possible thing to make it work, without changing
parser/compiler
to handle some additional pragma and insert special literal into method.

If you wondering how i detecting the bounds of a native code located
inside a compiled method,
you don't even need to go on VM side to look how i did it in my primitive.
Just install this changeset into your image:
http://code.google.com/p/nativeboost/downloads/detail?name=000-NativeCodeTrailers.cs&can=2&q=

where you can see, how its encoding raw bytes into trailer, and how decoding it.

See, i've been there, did that. And all i understood while sitting on
squeak-dev and trying to sell my ideas is,
that if you want your stuff being included/used then you need to do
very small changes, and don't dare
ever touching VM!

So, a NativeBoost project is my answer to this problem. I can do my
stuff without bothering touching VM.

Sure thing, if you want to integrate it in more nicer/generic way, i
won't object.

>>
>> > along with some link-edit metadata to allow Cog to fix-up
>> > data references in the code etc.
>>
>> No need. The generated code is position agnostic, and its using
>> interpreterProxy api for
>> speaking with VM. No position/object memory dependent data is stored
>> there.
>> You can copy it anywhere you want and it will still work correctly.
>> Native code during its interaction with VM fully relies on
>> interpreterProxy interface (as any other primitive).
>
> So the NB code uses absolute addressing to get hold of the interpreterProxy,
> right?  Depending on what the primitive is doing using interpreterProxy may
> be OK or may have poor performance.  A little bit of metadata woudl allow NB
> direct access to certain variables and allow it to short-circuit the slow
> double-indirect calls through interpreterProxy.
>

Yes, i know very well about drawbacks of such approach.
Exposing some variables, like newMethod, etc could make code little more faster.
Couple years ago i did that for Exupery VM-side code. So Exupery were able to
get pointer to any VM's global state and use them in generated code.
But i'm not really miss that :) If you provide a primitive which can
give me a pointer(s)
to any/some VM's variable by its name, then i will simply alter the
code generator to use it, if its available.
And if not, then it will keep doing the hard way.

About metadata: again, metadata for native code means more handling at VM side.
This is what i was trying to avoid all the time.
And this is the reason why i integrated NB with Cog so fast (except
callbacks stuff) :)

And btw, with some effort , i think its possible to make NativeBoost
to handle code relocation(s) by own.
The code generator would simply emit call to special
checkForRelocationAndRelocateIfNeeded()
routine, which will be the first thing at entry point of native code
which is not position agnostic.

Additionally, in NB, i started implementing a NBObjectFormat thingy,
so in some cases, a code generator already using it and can completely avoid
doing calls to interpeter proxy, and generate proper code for poking
inside an objects.

So, i wouldn't bother with relocation(s) and optimization(s) at
current stage. Only after we could see clearly, if there something,
which belongs to VM side, and can't be handled by native code itself :)

>> A native code is a function of kind:
>>
>> sqInt nativeFunction();
>>
>> if its not fails, it should return a valid oop.
>> So, after call you simply check if prim not failed:
>>
>>        result := self fnPtr. "fnPtr is native code"
>>        self failed ifFalse: [
>>                interpreterProxy pop: argCount + 1 thenPush: result.
>>                ] ifTrue: [ lastError := interpreterProxy
>> primitiveFailureCode].
>>
>> but of course, for inlined code we can change that , in order to
>> generate even more optimal code :)
>>
>> > If Cog finds a method containing NB code
>> > it copies this code into the start of the method (where primitive code
>> > is
>> > compiled) and does some simple link-editing on the code.  Now the Cog
>> > method
>> > starts with NB code which operates ;like a primitive, jumping to the
>> > start
>> > of the Cog method proper if it fails.
>>
>> A sketch idea is following:
>>  - reserve a special primitive number for 'a compiled method with
>> native code, which can be inlined'.
>
> Yes.  That's nice.
>
>>
>> Then you can easily detect in Cog inliner, if method can be inlined
>> after some extra checks:
>> Method could have such primitive, but may not have a native code,
>> since its not yet generated. Then you simply 'fail prim'
>> - i.e. simply go and run method interpreted.
>> This is what i am actually checking/doing each time before making a native
>> call.
>>
>> Once everything right (it finds a correct native code within a
>> method), its then free to either make a call to it or inline it.
>>
>> And as i understood, from image side, we have to guarantee that such
>> special inlined code
>> won't overflow the stack , so Cog won't need to switch stack pointer
>> to C stack before running it.
>
> Right.  But Cog stack pages have a generous headroom so you can count on 256
> bytes of headroom.  But we'd have to specify this.  Hopefully 256 bytes is
> enough for something that runs on the Smalltalk stack since one could argue
> that if it uses lots of stack its probably going to run for a whole and so
> the overhead of switching to/from teh C stack is affordable.  i.e. run only
> small performance-critical things on the Smalltalk stack.
>
Completely reasonable.

>>
>> Or, if you extend interpreterProxy with something like
>> pleaseMakeSureIAmUsingCStack() and
>> pleaseMakeSureIAmUsingCogStackAgain()
>> then NB could generate a code which will call these things
>> before/after external calls,
>> and so, Cog are free to inline even code which may contain external calls.
>
> How about a primitive that says "run this code on the C stack" and one that
> says "run this on the native stack"?

Maybe.
Then developer needs to decide (figure out) if his code using less
than 256 bytes of stack or not,
which could be controversial.
But i can't find the way how to detect that automatically, since i am
allowing to put arbitrary code into method,
no any lint rules, no stack checking whatever.  :)
Freedom comes with responsibility.

>
>>
>> > If we're talking about NB FFI calls then there's additional
>> > compilations.  A
>> > stack switch is required from Cog's stack pages to the real C stack, and
>> > back on return, etc.  But I'm encouraged that Igor has already got NB
>> > going
>> > on Cog.
>>
>> It wasn't hard.
>> I have to match interpreterProxy interface with Cog's one. (Btw i am
>> really miss the getStackPointer(),
>> which currently not implemented in Cog VM, i would be happy to have
>> this function available without Alien plugin :)
>
> I think thats very easy to add :)
>
>>
>> Second, i changed two methods which were responsible for marchalling
>> Float instances, since Cog
>> storing them in native format, while Squeak in big endian.
>>
>> >>
>> >> Btw, can you send a message from NB code?
>> >
>> > As I understand it, no.
>>
>> Of course it can't, because native code runs as a primitive.
>> Primitive can't do sends, because then obviously, it has to enter
>> interpreter loop ( via call to interpret()),
>> and once send complete, somehow, it should again continue to run native
>> code.
>> So, its implementation will be much more like a callback mechanism,
>> which already there.
>>
>> With NB you can create a native function with callback, then you
>> making an FFI callout to that function
>> and when it calls back, you can do 'message send from NB code' or
>> whatever you want.
>> So, roughly speaking it could be seen as a way to do a message sends
>> from native code :)
>
> cheers
> Eliot
>
>>
>> > best
>> > Eliot
>> >
>> >>
>> >> Levente
>> >>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>> Igor Stasenko AKA sig.
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

12