[SqueakJS] Faster JIT ideas

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[SqueakJS] Faster JIT ideas

codefrau
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
I almost forgot: this is how my console looks like when trying to start via Chrome

  • MemoryInfo {totalJSHeapSize: 2685131, usedJSHeapSize: 1667135, jsHeapSizeLimit: 4294705152}
startupBrowser.mjs:23
Loading initial glue code
startupBrowser.mjs:26
Loading translated Squeak image 4.5
startupBrowser.mjs:29
Loading generated classes (all the code, plus minimal reflection data)
startupBrowser.mjs:34
Elapsed 42074ms
startupBrowser.mjs:36
Loading manual overrides for the generated code
startupBrowser.mjs:39
Loading more glue code
startupBrowser.mjs:43
  • MemoryInfo {totalJSHeapSize: 195878416, usedJSHeapSize: 176798312, jsHeapSizeLimit: 4294705152}
startupBrowser.mjs:44
Loading serialized image state
SmalltalkGlobals.js:254
reified compiledMethod for method _notEqEq of class _ProtoObject has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _eqEq of class _ProtoObject has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _class of class _Object has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _lookup_ of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _rehash of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _allSymbolTablesDo_ of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _compactSymbolTable of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _allSymbols of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _allSymbolTablesDo_after_ of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _intern_ of class _Symbol class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _minVal of class _SmallInteger class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _maxVal of class _SmallInteger class has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _installLowSpaceWatcher of class _SmalltalkImage has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _signal of class _Exception has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _return_ of class _Exception has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _resumeUnchecked_ of class _Exception has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _pass of class _Exception has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _retry of class _Exception has not been set up, the method might be a manual JavaScript override
SmalltalkGlobals.js:254
reified compiledMethod for method _outer of class _Exception has not been set up, the method might be a manual JavaScript override
startupBrowser.mjs:47
Elapsed 105913ms
startupBrowser.mjs:49
  • MemoryInfo {totalJSHeapSize: 462667346, usedJSHeapSize: 438684470, jsHeapSizeLimit: 4294705152}
startupBrowser.mjs:51
  • MemoryInfo {totalJSHeapSize: 453213778, usedJSHeapSize: 373908346, jsHeapSizeLimit: 4294705152}
startupBrowser.mjs:52
Loading VM/scheduler
SmalltalkVM.js:81
cleaning up
SmalltalkVM.js:141
  • MemoryInfo {totalJSHeapSize: 409552989, usedJSHeapSize: 371929977, jsHeapSizeLimit: 4294705152}
SmalltalkVM.js:146
  • MemoryInfo {totalJSHeapSize: 409815133, usedJSHeapSize: 371860153, jsHeapSizeLimit: 4294705152}
SmalltalkVM.js:177
setting up the scheduler loop
SmalltalkVM.js:195
start known processes
SmalltalkVM.js:196
_Delay._startTimerEventLoop
SmalltalkVM.js:199
_Delay._startUp_(true)
SmalltalkVM.js:201
_ProcessorScheduler._startUp_(true)
SmalltalkVM.js:203
_WeakArray._startUp_(true)
SmalltalkVM.js:205
finished starting non-UI known processes
startUI_browser.mjs:25
starting EventSensor
startUI_browser.mjs:27
spawning primordial UI process
SmalltalkVM.js:180
entering new active process with priority 80
SmalltalkVM.js:184
primitive 86 semaphore wait
SmalltalkVM.js:180
entering new active process with priority 60
SmalltalkVM.js:184
primitive 86 semaphore wait
SmalltalkVM.js:180
entering new active process with priority 60
SmalltalkVM.js:184
primitive 85 semaphore signal
SmalltalkVM.js:180
entering new active process with priority 80
SmalltalkVM.js:184
primitive 86 semaphore wait
SmalltalkVM.js:180
entering new active process with priority 60
SmalltalkVM.js:184
primitive 86 semaphore wait
SmalltalkVM.js:180
entering new active process with priority 50
SmalltalkVM.js:184
primitive 86 semaphore wait
SmalltalkVM.js:180
entering new active process with priority 40
_DisplayScreen.js:843
Uncaught TypeError: Cannot read property 'width' of undefined

On Sun, Mar 14, 2021 at 9:18 PM Florin Mateoc <[hidden email]> wrote:
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

codefrau
In reply to this post by Florin Mateoc-4
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

codefrau
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
Well, who makes the generator call inside GlobalCheckForInterrupts if the method hosting the GlobalCheckForInterrupts call was not called via yield*? yield* needs to traverse the whole execution stack. If you do a top-level next(), no inner invocations will occur if they are not generator calls.

Sure, you can print the stack, but have you checked that you can switch and then come back and resume an interrupted one?

Anyway, thank you for the good news with Chrome. I will also check with Node

On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
Perhaps better expressed: since benchFiby is a generator function, the call benchFiby() does not execute its contents (so it cannot reach any potential yields inside it), it just returns a generator, so we have to make all the calls generator calls

On Mon, Mar 15, 2021 at 10:56 PM Florin Mateoc <[hidden email]> wrote:
Well, who makes the generator call inside GlobalCheckForInterrupts if the method hosting the GlobalCheckForInterrupts call was not called via yield*? yield* needs to traverse the whole execution stack. If you do a top-level next(), no inner invocations will occur if they are not generator calls.

Sure, you can print the stack, but have you checked that you can switch and then come back and resume an interrupted one?

Anyway, thank you for the good news with Chrome. I will also check with Node

On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

codefrau
In reply to this post by Florin Mateoc-4
 
On Mon, Mar 15, 2021 at 7:57 PM Florin Mateoc <[hidden email]> wrote:

Perhaps better expressed: since benchFiby is a generator function, the call benchFiby() does not execute its contents (so it cannot reach any potential yields inside it), it just returns a generator, so we have to make all the calls generator calls

Ah, that's a JS thing I had not understood yet, that all calls to generator functions need to be made using yield*, and you can only use yield* inside of a generator function.
 
Well, who makes the generator call inside GlobalCheckForInterrupts if the method hosting the GlobalCheckForInterrupts call was not called via yield*?

I think that would be the trampoline function that goes from interpreted code to compiled code. That same function needs to handle grabbing function args off the context's stack, passing them as arguments to the jitted method, and once that returns, popping the stack + pushing the result.

yield* needs to traverse the whole execution stack. If you do a top-level next(), no inner invocations will occur if they are not generator calls.

Okay. I'll have to think about that. My hope was that all the regular execution can happen using regular function invocations and only process switching would use yield. But you're right, we need to use yield* all the way down. Hope that's not too expensive ... or maybe unwinding to process switch is the better tradeoff after all.

Sure, you can print the stack, but have you checked that you can switch and then come back and resume an interrupted one?

Not yet, I'm still designing it and trying things. 
 
Anyway, thank you for the good news with Chrome. I will also check with Node

Meanwhile, I made a little scheduler just to learn how to use yield*. Works nicely:

Vanessa
 
On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

codefrau
 
On Tue, Mar 16, 2021 at 1:38 AM Vanessa Freudenberg <[hidden email]> wrote:
Meanwhile, I made a little scheduler just to learn how to use yield*. Works nicely:

And here is a JIT mockup using that scheduler:

It's executing "5 benchFib", "4 benchFib", and "3 benchFib" in three separate processes, switching every 3 sends just for testing, and printing all contexts.

No benchmarks yet.

Vanessa
 
Vanessa
 
On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
Small update: I can now see the UI, and even interact with it a little (before it breaks with various bugs :)) - woohoo!

Anyway, while debugging my various bugs, I found a small bug in the SqueakJS primitive 53 - I do consult them regularly :)
Squeak uses the IEEE 754 specification for exponent (significand between 1 and 2), frexp does not (significand between 0.5 and 1), so the result is off by one.
It should not matter much, as the significand is computed based on the exponent, but e.g. the code executed when the primitive fails gives a different result from the primitive itself

On Wed, Mar 17, 2021 at 5:01 AM Vanessa Freudenberg <[hidden email]> wrote:
 
On Tue, Mar 16, 2021 at 1:38 AM Vanessa Freudenberg <[hidden email]> wrote:
Meanwhile, I made a little scheduler just to learn how to use yield*. Works nicely:

And here is a JIT mockup using that scheduler:

It's executing "5 benchFib", "4 benchFib", and "3 benchFib" in three separate processes, switching every 3 sends just for testing, and printing all contexts.

No benchmarks yet.

Vanessa
 
Vanessa
 
On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

codefrau
 
On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <[hidden email]> wrote:
 
Small update: I can now see the UI, and even interact with it a little (before it breaks with various bugs :)) - woohoo!

Awesome!

And thanks for the bug report! I think you might have missed that SqueakJS does in fact subtract 1 from the frexp() result:

case 53: return this.popNandPushIntIfOK(argCount+1, this.frexp_exponent(this.stackFloat(0)) - 1);

I'm almost certain it is correct. I wrote a whole blockpost about emulating frexp in JavaScript 7 years ago:

Btw, I determined that yield* is incredibly slow, because (as you were saying) every send now needs to return a generator function. So I had to scrap my initial plan for the new SqueakJS JIT. I'm going with regular function calls now, and reifying the stack via exception unwinding when needed, rather than trying to keep multiple call stacks alive. See this twitter thread: 

Vanessa

On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <[hidden email]> wrote:
 
Small update: I can now see the UI, and even interact with it a little (before it breaks with various bugs :)) - woohoo!

Anyway, while debugging my various bugs, I found a small bug in the SqueakJS primitive 53 - I do consult them regularly :)
Squeak uses the IEEE 754 specification for exponent (significand between 1 and 2), frexp does not (significand between 0.5 and 1), so the result is off by one.
It should not matter much, as the significand is computed based on the exponent, but e.g. the code executed when the primitive fails gives a different result from the primitive itself

On Wed, Mar 17, 2021 at 5:01 AM Vanessa Freudenberg <[hidden email]> wrote:
 
On Tue, Mar 16, 2021 at 1:38 AM Vanessa Freudenberg <[hidden email]> wrote:
Meanwhile, I made a little scheduler just to learn how to use yield*. Works nicely:

And here is a JIT mockup using that scheduler:

It's executing "5 benchFib", "4 benchFib", and "3 benchFib" in three separate processes, switching every 3 sends just for testing, and printing all contexts.

No benchmarks yet.

Vanessa
 
Vanessa
 
On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa
Reply | Threaded
Open this post in threaded view
|

Re: [SqueakJS] Faster JIT ideas

Florin Mateoc-4
 
Blush. I did miss the - 1, sorry for the noise.

With regards to yield*, I am sure the engines will continue to improve the performance - it is relatively new, so I assume there might be some low hanging fruit.
We'll see. I am trying to concentrate on correctness first, I also have a lot of room for optimizations

Florin

On Wed, Apr 7, 2021 at 8:46 PM Vanessa Freudenberg <[hidden email]> wrote:
 
On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <[hidden email]> wrote:
 
Small update: I can now see the UI, and even interact with it a little (before it breaks with various bugs :)) - woohoo!

Awesome!

And thanks for the bug report! I think you might have missed that SqueakJS does in fact subtract 1 from the frexp() result:

case 53: return this.popNandPushIntIfOK(argCount+1, this.frexp_exponent(this.stackFloat(0)) - 1);

I'm almost certain it is correct. I wrote a whole blockpost about emulating frexp in JavaScript 7 years ago:

Btw, I determined that yield* is incredibly slow, because (as you were saying) every send now needs to return a generator function. So I had to scrap my initial plan for the new SqueakJS JIT. I'm going with regular function calls now, and reifying the stack via exception unwinding when needed, rather than trying to keep multiple call stacks alive. See this twitter thread: 

Vanessa

On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <[hidden email]> wrote:
 
Small update: I can now see the UI, and even interact with it a little (before it breaks with various bugs :)) - woohoo!

Anyway, while debugging my various bugs, I found a small bug in the SqueakJS primitive 53 - I do consult them regularly :)
Squeak uses the IEEE 754 specification for exponent (significand between 1 and 2), frexp does not (significand between 0.5 and 1), so the result is off by one.
It should not matter much, as the significand is computed based on the exponent, but e.g. the code executed when the primitive fails gives a different result from the primitive itself

On Wed, Mar 17, 2021 at 5:01 AM Vanessa Freudenberg <[hidden email]> wrote:
 
On Tue, Mar 16, 2021 at 1:38 AM Vanessa Freudenberg <[hidden email]> wrote:
Meanwhile, I made a little scheduler just to learn how to use yield*. Works nicely:

And here is a JIT mockup using that scheduler:

It's executing "5 benchFib", "4 benchFib", and "3 benchFib" in three separate processes, switching every 3 sends just for testing, and printing all contexts.

No benchmarks yet.

Vanessa
 
Vanessa
 
On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Thanks for that Florin, very helpful!

I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()?

My implementation is intended to allow context switching that way:

if (--vm.interruptCheckCounter <= 0 && vm.handleDepthAndInterrupts(depth, thisProxy) === true) return false;

This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. 

My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with
// throw Error("unwind")

I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two.

I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance?

Vanessa

On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
 
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be  much slower.
Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation:

Number.prototype.benchFib = function benchFib() {
  return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1
}

var t = performance.now();
(30).benchFib();
performance.now() - t

gives on my laptop in Firefox 920, 911, 919, 898

Versus

Number.prototype.benchFiby = function* benchFiby() {
   return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1
}

var t = performance.now();
(30).benchFiby().next();
performance.now() - t

gives 2998, 3125, 3116, 3140


On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi Florin,

wow, that looks exciting!

This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it!

Since my original post I implemented a mockup of "my" new JIT scheme:

image.png
You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw

Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. 

Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :)

Vanessa

On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
 
Hi Vanessa,

Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak.
In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak.

I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :)

I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns.
Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them.
I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives).
With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as:

*_benchFib() {
if (Number.isSafeInteger(this.valueOf())) { // Effective (inherited or local) source for #benchFib in SmallInteger
/*Handy send-heavy benchmark*/
/*(result // seconds to run) = approx calls per second*/
/* | r t |
t := Time millisecondsToRun: [r := 26 benchFib].
(r // 1000) // t*/
/*138000 on a Mac 8100//100*/
if (GlobalActivationCounter-- < 0) yield* GlobalCheckForInterrupts();

return (yield* this._lt( 2)).booleanValueOf("questionMark:colon:") ? (1) : (yield* (yield* (yield* (yield* this._sub( 1))._benchFib())._add( yield* (yield* this._sub( 2))._benchFib()))._add( 1));
} else // No implementation for #benchFib in Float hierarchy, trigger a DNU
return yield* super._benchFib()
}

The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number.
The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak).
Of course, in Boolean, booleanValueOf is just an alias for valueOf

As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, 
although not completely, since their result can morph into LargeInteger (mapped to BigInt).

As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt.
BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. 
I have briefly considered also doing slightly higher-level mappings, for IdentitySet to Set and IdentityDictionary to Map, but this is not a priority.
The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded.
Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes.
The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started.

One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize 
and hook up the display.

Florin



On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
 
Hi all,

ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down:


Feedback welcome!

Vanessa