Hi all,
ideas for a faster SqueakJS JIT have been swirling around in my head for many years. This weekend I took the time to write some of them down: Feedback welcome! Vanessa |
Hi Vanessa, Sorry for the delay in responding - as somebody who has been inspired by your SqueakJS project, I think I should mention that I am working on a related project, for now tentatively called JsSqueak. In addition to the inspiration provided by SqueakJS, it also scratches my longstanding itch about compiling (transpiling) Squeak. I hesitated to talk about it, as it is still a work in progress - after small bits and pieces that I worked on over a long period, I had the opportunity to spend a significant and uninterrupted chunk of time on it last summer, when I was unemployed for 3 months, and I was able to make good progress. I was optimistically thinking of releasing a first version before the end of last year, but after I started working on my new job, progress on JsSqueak has slowed down significantly. I must confess that I (and especially my wife) hesitate in recreating that productive unemployed situation :) I started with Squeak 4.5 - I already had code transforming Smalltalk code to a form more suitable for translation - and I also started with VMMakerJS-dtl.18 for the plugin generation part. Of course, I had to heavily modify it, since I have to get rid of the stack usage for arguments/receiver and returns. Both of these big parts are working. I also implemented most numbered primitives by hand - they are inlined at generation time in the methods calling them. I am also taking advantage of the latest and greatest additions to JavaScript. I am, of course, using classes, and the parallel class-side hierarchy is implemented using statics. To implement green threads/process switching, all translated methods are implemented as generator functions, and all calls are through yield* expressions. The preemption/interrupt check points are inlined. With this, a process switch is achieved by simply yield-ing (in the process/semaphore primitives). With this, the Integer>>#benchFib method is translated (as a method in Number.prototype, there is one more, simpler, implementation in BigInt) as: *_benchFib() { The top-level check for smallIntegers is because both SmallInteger and Float are mapped to Number. The booleanValueOf call is for implementing the mustBeBoolean machinery (it actually translates directly to DNU, like it is done nowadays in Squeak). Of course, in Boolean, booleanValueOf is just an alias for valueOf As you can see, though, this is not terribly efficient, but there is room for improvement/optimizations. With more work, in this case, the _lt call could be replaced by the < operator, and even the _sub and _add calls could be optimized, As hinted above, SmallInteger is mapped to Number (in the safeInteger range), Float is mapped to Number as well, and LargeInteger is mapped to BigInt. BlockClosure is mapped to Function, Boolean is mapped to Boolean, Character is mapped to String, weak references are implemented via WeakRef. The image is serialized sort of like a JavaScript storeString. No processes or contexts though, or rather they are not brought back in on the JavaScript side. Blocks are both stored and loaded. Non-local returns, unwind blocks, resumable and restartable exceptions are implemented via JavaScript exception handling plus explicit handler block chains associated with the processes. The "image" starts with the global state loaded, but all processes are started from scratch instead of resumed. A non-UI image is thus trivially started. One major todo left is hooking up the UI/browser. I did take vm.display.browser.js from SqueakJS and adapted the code in order to implement its numbered primitives, but I still have to work through squeak.js from the same to initialize Florin On Sun, Mar 7, 2021 at 11:17 PM Vanessa Freudenberg <[hidden email]> wrote:
|
I almost forgot: this is how my console looks like when trying to start via Chrome
startupBrowser.mjs:23 Loading initial glue code startupBrowser.mjs:26 Loading translated Squeak image 4.5 startupBrowser.mjs:29 Loading generated classes (all the code, plus minimal reflection data) startupBrowser.mjs:34 Elapsed 42074ms
startupBrowser.mjs:36 Loading manual overrides for the generated code startupBrowser.mjs:39 Loading more glue code startupBrowser.mjs:43
startupBrowser.mjs:44 Loading serialized image state SmalltalkGlobals.js:254 reified compiledMethod for method _notEqEq of class _ProtoObject has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _eqEq of class _ProtoObject has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _class of class _Object has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _lookup_ of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _rehash of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _allSymbolTablesDo_ of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _compactSymbolTable of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _allSymbols of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _allSymbolTablesDo_after_ of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _intern_ of class _Symbol class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _minVal of class _SmallInteger class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _maxVal of class _SmallInteger class has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _installLowSpaceWatcher of class _SmalltalkImage has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _signal of class _Exception has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _return_ of class _Exception has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _resumeUnchecked_ of class _Exception has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _pass of class _Exception has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _retry of class _Exception has not been set up, the method might be a manual JavaScript override SmalltalkGlobals.js:254 reified compiledMethod for method _outer of class _Exception has not been set up, the method might be a manual JavaScript override startupBrowser.mjs:47 Elapsed 105913ms
startupBrowser.mjs:49
startupBrowser.mjs:51
startupBrowser.mjs:52 Loading VM/scheduler SmalltalkVM.js:81 cleaning up SmalltalkVM.js:141
SmalltalkVM.js:146
SmalltalkVM.js:177 setting up the scheduler loop SmalltalkVM.js:195 start known processes SmalltalkVM.js:196 _Delay._startTimerEventLoop SmalltalkVM.js:199 _Delay._startUp_(true) SmalltalkVM.js:201 _ProcessorScheduler._startUp_(true) SmalltalkVM.js:203 _WeakArray._startUp_(true) SmalltalkVM.js:205 finished starting non-UI known processes startUI_browser.mjs:25 starting EventSensor startUI_browser.mjs:27 spawning primordial UI process SmalltalkVM.js:180 entering new active process with priority 80 SmalltalkVM.js:184 primitive 86 semaphore wait SmalltalkVM.js:180 entering new active process with priority 60 SmalltalkVM.js:184 primitive 86 semaphore wait SmalltalkVM.js:180 entering new active process with priority 60 SmalltalkVM.js:184 primitive 85 semaphore signal SmalltalkVM.js:180 entering new active process with priority 80 SmalltalkVM.js:184 primitive 86 semaphore wait SmalltalkVM.js:180 entering new active process with priority 60 SmalltalkVM.js:184 primitive 86 semaphore wait SmalltalkVM.js:180 entering new active process with priority 50 SmalltalkVM.js:184 primitive 86 semaphore wait SmalltalkVM.js:180 entering new active process with priority 40 _DisplayScreen.js:843
Uncaught TypeError: Cannot read property 'width' of undefined On Sun, Mar 14, 2021 at 9:18 PM Florin Mateoc <[hidden email]> wrote:
|
In reply to this post by Florin Mateoc-4
Hi Florin, wow, that looks exciting! This is indeed a much more thorough Squeak-to-JS mapping, where mine is more of a "traditional" VM. I love it! Since my original post I implemented a mockup of "my" new JIT scheme: You can play with it here: https://codepen.io/codefrau/pen/JjbmVGw Could you share the performance numbers you are seeing for your benchFib, in comparison to SqueakJS or my mockup? I am curious if yield* is the way to go. Thanks for sharing! And congrats on your new job. My progress is slow too, I only work on it some weekends. But then, I'm not in a hurry :) Vanessa On Sun, Mar 14, 2021 at 6:18 PM Florin Mateoc <[hidden email]> wrote:
|
I don't think the numbers could be meaningfully compared. The whole purpose of the yield* invocations is to enable process switching, which I don't think your jitted methods allow for. But then are you merely comparing yield* invocations against direct invocations? For sure, the yield* ones will be much slower. Let alone my implementation, which also uses yield* for #< , #- and #+, so it would be much slower than below as operators are surely well optimized by JS, we can just measure direct vs yield* for the main recursive invocation: Number.prototype.benchFib = function benchFib() { return this < 2 ? 1 : (this - 1).benchFib() + (this - 2).benchFib() + 1 } var t = performance.now(); (30).benchFib(); performance.now() - t gives on my laptop in Firefox 920, 911, 919, 898 Versus
Number.prototype.benchFiby = function* benchFiby() { return this < 2 ? 1 : (yield* (this - 1).benchFiby()) + (yield* (this - 2).benchFiby()) + 1 } var t = performance.now(); (30).benchFiby().next(); performance.now() - t gives 2998, 3125, 3116, 3140 On Mon, Mar 15, 2021 at 3:41 PM Vanessa Freudenberg <[hidden email]> wrote:
|
Thanks for that Florin, very helpful! I'm curious why you need to make every send into a generator call, and not just rely on the one in your GlobalCheckForInterrupts()? My implementation is intended to allow context switching that way:
This is the same technique as in other VMs, where the actual check for context switch is not done for every send, but only when the interruptCheckCounter goes below zero. In the first mockup on codepen, vm.handleDepthAndInterrupts prints the contents of all the contexts, proving that the information is accessible in case we need to context switch or reify the stack. My hope was that inside of handleDepthAndInterrupts() I could use *yield to do the actual context switching. My second, exception based mockup on codepen uses the same approach, except that when a context switch is needed it would throw, unwinding the stack fully, and creating actual context objects along the way. I have not mocked up that part yet (because it would need a mockup interpreter too, to continue after unwind), but you can see the unwind working correctly by uncommenting the line with // throw Error("unwind") I tried your benchFib vs benchFiby on Chrome, which seems to optimize generators a lot better than Firefox. On Chrome the overhead is just 50% or so, vs 300% on Firefox. Safari appears to be between the two. I will need to make more complete mockups before deciding on a design. Especially how closures would be handled. Does anyone have a tiny one-method benchmark that would highlight closure performance? Vanessa On Mon, Mar 15, 2021 at 6:20 PM Florin Mateoc <[hidden email]> wrote:
|
Well, who makes the generator call inside GlobalCheckForInterrupts if the method hosting the GlobalCheckForInterrupts call was not called via yield*? yield* needs to traverse the whole execution stack. If you do a top-level next(), no inner invocations will occur if they are not generator calls. Sure, you can print the stack, but have you checked that you can switch and then come back and resume an interrupted one? Anyway, thank you for the good news with Chrome. I will also check with Node On Mon, Mar 15, 2021 at 10:28 PM Vanessa Freudenberg <[hidden email]> wrote:
|
Perhaps better expressed: since benchFiby is a generator function, the call benchFiby() does not execute its contents (so it cannot reach any potential yields inside it), it just returns a generator, so we have to make all the calls generator calls On Mon, Mar 15, 2021 at 10:56 PM Florin Mateoc <[hidden email]> wrote:
|
In reply to this post by Florin Mateoc-4
On Mon, Mar 15, 2021 at 7:57 PM Florin Mateoc <[hidden email]> wrote: Perhaps better expressed: since benchFiby is a generator function, the call benchFiby() does not execute its contents (so it cannot reach any potential yields inside it), it just returns a generator, so we have to make all the calls generator calls Ah, that's a JS thing I had not understood yet, that all calls to generator functions need to be made using yield*, and you can only use yield* inside of a generator function.
I think that would be the trampoline function that goes from interpreted code to compiled code. That same function needs to handle grabbing function args off the context's stack, passing them as arguments to the jitted method, and once that returns, popping the stack + pushing the result.
Okay. I'll have to think about that. My hope was that all the regular execution can happen using regular function invocations and only process switching would use yield. But you're right, we need to use yield* all the way down. Hope that's not too expensive ... or maybe unwinding to process switch is the better tradeoff after all.
Not yet, I'm still designing it and trying things.
Meanwhile, I made a little scheduler just to learn how to use yield*. Works nicely:
|
On Tue, Mar 16, 2021 at 1:38 AM Vanessa Freudenberg <[hidden email]> wrote:
And here is a JIT mockup using that scheduler: It's executing "5 benchFib", "4 benchFib", and "3 benchFib" in three separate processes, switching every 3 sends just for testing, and printing all contexts. No benchmarks yet. Vanessa
|
Small update: I can now see the UI, and even interact with it a little (before it breaks with various bugs :)) - woohoo! Anyway, while debugging my various bugs, I found a small bug in the SqueakJS primitive 53 - I do consult them regularly :) Squeak uses the IEEE 754 specification for exponent (significand between 1 and 2), frexp does not (significand between 0.5 and 1), so the result is off by one. It should not matter much, as the significand is computed based on the exponent, but e.g. the code executed when the primitive fails gives a different result from the primitive itself On Wed, Mar 17, 2021 at 5:01 AM Vanessa Freudenberg <[hidden email]> wrote:
|
On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <[hidden email]> wrote:
Awesome! And thanks for the bug report! I think you might have missed that SqueakJS does in fact subtract 1 from the frexp() result: case 53: return this.popNandPushIntIfOK(argCount+1, this.frexp_exponent(this.stackFloat(0)) - 1); I'm almost certain it is correct. I wrote a whole blockpost about emulating frexp in JavaScript 7 years ago: Btw, I determined that yield* is incredibly slow, because (as you were saying) every send now needs to return a generator function. So I had to scrap my initial plan for the new SqueakJS JIT. I'm going with regular function calls now, and reifying the stack via exception unwinding when needed, rather than trying to keep multiple call stacks alive. See this twitter thread: Vanessa On Wed, Apr 7, 2021 at 3:07 PM Florin Mateoc <[hidden email]> wrote:
|
Blush. I did miss the - 1, sorry for the noise. With regards to yield*, I am sure the engines will continue to improve the performance - it is relatively new, so I assume there might be some low hanging fruit. We'll see. I am trying to concentrate on correctness first, I also have a lot of room for optimizations Florin On Wed, Apr 7, 2021 at 8:46 PM Vanessa Freudenberg <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |