ImpCompiler - introduction, status, problems

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ImpCompiler - introduction, status, problems

Herby Vojčík
Hello,

this is a bit longer mail that goes a bit into details of the
alternative ImpCompiler and the actual status/problems. First part may
help also those who want to understand existing compiler.



ImpCompiler is Amber compiler with different generation of code, but
reusing the same parser and AST as the existing compiler (I'll call it
FunCompiler).



FunCompiler is build around expressions - every node (with exception of
a few ones, notable return, assignment and inline JS statment) are
compiled to produce a JS expression that produces the value of the node.
In particular, send is also assumed to return an expression, like `32
foo` produces expression: `smalltalk.send((32), "_foo", [])`. Compiler
optimizations must obey the assumption and return expression as well, so
`true ifTrue: [whatever] ifFalse: [somethingelse]` is inlined as
`((..check if true is Boolean..)?((true)?(function(){...compile
whatever...})():(function(){...compile
somethingelse...})()):smalltalk.send((true),"_ifTrue_ifFalse_",
[function(){...},function(){...}]))`. IIFEs are powerful, but alas, you
can't easily inline thing like `x ifTrue: [^42]` into `if ((..x is
Boolean..)) { if (x) return 42; } else {smalltalk.send(...)}`, you use
?: with IIFEs which throw to do non-local return and you should catch it.



ImpCompiler is built as a sort-of clone of FunCompiler, with changing
the main assumption - instead of each node compiling to value-producing
expression, it compiles it into (lazy) assignment of the value into
(maybe internal) variable. So `32 foo: "bar"` gets compiled into series
of lazy assignments: `$0=32; $2="bar"; $1=[$2];
target=smalltalk.send($0,"_foo_",$1);`. There always must be a target
variable for the assignment of the value - it is the core of
ImpCompiler. The notion of target variable is a bit more abstract - it
can represent the variable to store the value into, but it can also be
blank (just do it, value is not needed - used for statements in method
and/or block), '^' (return the value using direct return) or '!' (return
the value from block using throw-catch mechanism).
If all assignments would left lazy, nothing would have been done, so in
fact, at some points, the values must be actually computed (that is,
code that computes them must be produced). There are basically two such
situations:
- before actual assignment - some variables are not lazy: the already
mentioned "just do it", "^" and "!" - they just need the value; also any
other "real" variable - like one used in smalltalk assignment
(assignment node just gets the variable to assign from LHS and lets the
RHS to be processed normally, just setting the "target" to the
variable-to-be-assigned; since it is not lazy, the actual JS code to
compute the value and assign it is produced).
- before if - whenever the code is to be inlined so that it uses if,
uncertainity appears and all the (free*) lazy values must be eagerly
evaluated (and stored to actual $n variables). Also, target of such
inlined node is made non-lazy - so in all branches of if, it is assigned
eagerly.

In ImpCompiler, `true ifTrue: [^42]` produces series of lazy assignments:
`$0=true; "eager evaluation, 'if' is ahead"; if (...$0 is boolean...) if
($0) { "return" = 42; "just do it" = nothing } else { "just do it" =
optional***(nil) } else { $3 = function() { "throw" = 42; "return" =
nothing }; $2 = [$3]; "just do it" = smalltalk.send($0, "_ifTrue_", $2) }`
which ends up as
`$0=true; if (...$0 is boolean...) if ($0) { return 42; } else {} else {
smalltalk.send($0, "_ifTrue_", [function(){ throw $early=[42]; }]) }`
while marking $0 as "I need this internal variable declared as real,
since it is used in the code".

(the "nothing" from the lazy assignments is the "expression value" of
return node - an implementation detail that lets process return node
similarly to expression / assigment nodes - no code is produces if this
"nothing" is assigned to any target)



Status: ImpCompiler is basically working, it can process any amber code
and produce JS code for it. There are just a few stoppers:

- variable name-clash and inlining: if names of variables between method
and block (or block and subblock) clash, this means problem. Even more
complicated is the issue of colliding variable in the block with the
"unknown" global variable in the method (espoecially if the "unknown"
appears later - compiler would need 2 passes to catch unknown variable
names in first pass - it is problem of FunCompiler as well, see issue
191 in github).
The solutions are:
- to use different name in block/subblock - the solution I would embrace
most cheerfully. Problem with internal renaming is, it invalidates the
names of variables to use with inline JS statement blocks. I would have
no problem with completely prohibiting inline JS in blocks - but there
certainly are people who would like to have it there as well.**
- to check block for "safety" and if unsafe, simply IIFE it. This not
only increases complexity, but also is a bit ridiculous: if I want to
tweak a block to go really fast by adding a well-placed inline JS, it is
automatically unsafe, thus IIFEd, thus slower :-/
- saving/restoring the offending variables: it can save the day for
basic nameclash, but against unknown-block nameclash it is weak.



For the curious, code of ImpCompiler can be found at
https://github.com/herby/amber/tree/newcompiler.



Herby



* if a lazy variables is used inside an expression, it is "bound", that
is, it is presumed to be used and needs no actual materialization. Also,
variable that is assigned immutable value, is treated as bound (it is
not eagerly evaluated and stored, it just continues to be alias for its
value).
** The big case against inline JS in blocks in their inlining alone - if
a block is inlined, "return" means something different than when it is
not. Or it can be put the other way - there is simply no possible way to
inline blocks containing inline JS, because of "return".
*** optional appears a few times mainly not to clutter code with
unneeded things. There are assigned to any target except the blank "just
do it" where they are omitted. If there were '^' before the ifTrue:
example before, the code produced would be `$0=true; if (...$0 is
boolean...) if ($0) { return 42; } else { return nil; } else { return
smalltalk.send($0, "_ifTrue_", [function(){ throw $early=[42]; }]) }`,
so in this case nil would not be omitted (since its value is wanted).
Reply | Threaded
Open this post in threaded view
|

Re: ImpCompiler - introduction, status, problems

Nicolas Petton
Herby Vojčík <[hidden email]> writes:

Very nice! Now I have to read the code :)

1 question: Do you compile methods with contexts, ie

function() {
  var self=this;
  smalltalk.pushStack(self, ...);
  ...
  ...
  smalltalk.popStack();
  return $returnValue;
}

I would really like to have context aware methods, and only create
context objects lazily when needed (use of `thisContext`).

When I'll find a bit of time, I will modify the current compiler do to this.

Cheers,
Nico

> Hello,
>
> this is a bit longer mail that goes a bit into details of the
> alternative ImpCompiler and the actual status/problems. First part may
> help also those who want to understand existing compiler.
>
>
>
> ImpCompiler is Amber compiler with different generation of code, but
> reusing the same parser and AST as the existing compiler (I'll call it
> FunCompiler).
>
>
>
> FunCompiler is build around expressions - every node (with exception
> of a few ones, notable return, assignment and inline JS statment) are
> compiled to produce a JS expression that produces the value of the
> node. In particular, send is also assumed to return an expression,
> like `32 foo` produces expression: `smalltalk.send((32), "_foo",
> [])`. Compiler optimizations must obey the assumption and return
> expression as well, so `true ifTrue: [whatever] ifFalse:
> [somethingelse]` is inlined as `((..check if true is
> Boolean..)?((true)?(function(){...compile
> whatever...})():(function(){...compile
> somethingelse...})()):smalltalk.send((true),"_ifTrue_ifFalse_",
> [function(){...},function(){...}]))`. IIFEs are powerful, but alas,
> you can't easily inline thing like `x ifTrue: [^42]` into `if ((..x is
> Boolean..)) { if (x) return 42; } else {smalltalk.send(...)}`, you use
> ?: with IIFEs which throw to do non-local return and you should catch
> it.
>
>
>
> ImpCompiler is built as a sort-of clone of FunCompiler, with changing
> the main assumption - instead of each node compiling to
> value-producing expression, it compiles it into (lazy) assignment of
> the value into (maybe internal) variable. So `32 foo: "bar"` gets
> compiled into series of lazy assignments: `$0=32; $2="bar"; $1=[$2];
> target=smalltalk.send($0,"_foo_",$1);`. There always must be a target
> variable for the assignment of the value - it is the core of
> ImpCompiler. The notion of target variable is a bit more abstract - it
> can represent the variable to store the value into, but it can also be
> blank (just do it, value is not needed - used for statements in method
> and/or block), '^' (return the value using direct return) or '!'
> (return the value from block using throw-catch mechanism).
> If all assignments would left lazy, nothing would have been done, so
> in fact, at some points, the values must be actually computed (that
> is, code that computes them must be produced). There are basically two
> such situations:
> - before actual assignment - some variables are not lazy: the already
> mentioned "just do it", "^" and "!" - they just need the value; also
> any other "real" variable - like one used in smalltalk assignment
> (assignment node just gets the variable to assign from LHS and lets
> the RHS to be processed normally, just setting the "target" to the
> variable-to-be-assigned; since it is not lazy, the actual JS code to
> compute the value and assign it is produced).
> - before if - whenever the code is to be inlined so that it uses if,
> uncertainity appears and all the (free*) lazy values must be eagerly
> evaluated (and stored to actual $n variables). Also, target of such
> inlined node is made non-lazy - so in all branches of if, it is
> assigned eagerly.
>
> In ImpCompiler, `true ifTrue: [^42]` produces series of lazy assignments:
> `$0=true; "eager evaluation, 'if' is ahead"; if (...$0 is boolean...)
> if ($0) { "return" = 42; "just do it" = nothing } else { "just do it"
> = optional***(nil) } else { $3 = function() { "throw" = 42; "return" =
> nothing }; $2 = [$3]; "just do it" = smalltalk.send($0, "_ifTrue_",
> $2) }`
> which ends up as
> `$0=true; if (...$0 is boolean...) if ($0) { return 42; } else {} else
> { smalltalk.send($0, "_ifTrue_", [function(){ throw $early=[42]; }])
> }`
> while marking $0 as "I need this internal variable declared as real,
> since it is used in the code".
>
> (the "nothing" from the lazy assignments is the "expression value" of
> return node - an implementation detail that lets process return node
> similarly to expression / assigment nodes - no code is produces if
> this "nothing" is assigned to any target)
>
>
>
> Status: ImpCompiler is basically working, it can process any amber
> code and produce JS code for it. There are just a few stoppers:
>
> - variable name-clash and inlining: if names of variables between
> method and block (or block and subblock) clash, this means
> problem. Even more complicated is the issue of colliding variable in
> the block with the "unknown" global variable in the method
> (espoecially if the "unknown" appears later - compiler would need 2
> passes to catch unknown variable names in first pass - it is problem
> of FunCompiler as well, see issue 191 in github).
> The solutions are:
> - to use different name in block/subblock - the solution I would
> embrace most cheerfully. Problem with internal renaming is, it
> invalidates the names of variables to use with inline JS statement
> blocks. I would have no problem with completely prohibiting inline JS
> in blocks - but there certainly are people who would like to have it
> there as well.**
> - to check block for "safety" and if unsafe, simply IIFE it. This not
> only increases complexity, but also is a bit ridiculous: if I want to
> tweak a block to go really fast by adding a well-placed inline JS, it
> is automatically unsafe, thus IIFEd, thus slower :-/
> - saving/restoring the offending variables: it can save the day for
> basic nameclash, but against unknown-block nameclash it is weak.
>
>
>
> For the curious, code of ImpCompiler can be found at
> https://github.com/herby/amber/tree/newcompiler.
>
>
>
> Herby
>
>
>
> * if a lazy variables is used inside an expression, it is "bound",
> that is, it is presumed to be used and needs no actual
> materialization. Also, variable that is assigned immutable value, is
> treated as bound (it is not eagerly evaluated and stored, it just
> continues to be alias for its value).
> ** The big case against inline JS in blocks in their inlining alone -
> if a block is inlined, "return" means something different than when it
> is not. Or it can be put the other way - there is simply no possible
> way to inline blocks containing inline JS, because of "return".
> *** optional appears a few times mainly not to clutter code with
> unneeded things. There are assigned to any target except the blank
> "just do it" where they are omitted. If there were '^' before the
> ifTrue: example before, the code produced would be `$0=true; if (...$0
> is boolean...) if ($0) { return 42; } else { return nil; } else {
> return smalltalk.send($0, "_ifTrue_", [function(){ throw $early=[42];
> }]) }`, so in this case nil would not be omitted (since its value is
> wanted).
Reply | Threaded
Open this post in threaded view
|

Re: ImpCompiler - introduction, status, problems

Herby Vojčík
[hidden email] wrote:

> Herby Vojčík<[hidden email]>  writes:
>
> Very nice! Now I have to read the code :)
>
> 1 question: Do you compile methods with contexts, ie
>
> function() {
>    var self=this;
>    smalltalk.pushStack(self, ...);
>    ...
>    ...
>    smalltalk.popStack();
>    return $returnValue;
> }

That's the next step (after solving name-clash and one additional little
thing which is trivial to do, just needs to be done, and I'll do it
after nameclash is solved).

> I would really like to have context aware methods, and only create
> context objects lazily when needed (use of `thisContext`).

That in fact can be done in totally lazy manner, by just having
continuations and record pc in the method. Basically this way:
- run methods totally without any overhead except try/catch (try costs
virtually nothing if catch does not appear), but update pc while running
- if thisContext is wanted, throw specific exception
- catch in every method (since it's specific), saving the pcs
- in the bottom, re-run the methods, but now creating actual contexts
and of course continuing from the interrupted pcs
- now, thisContext is available
- subsequent calls again need not to create the context, only after
thisContext is needed again, it is rewound down to the point where last
one actually exists and re-run to create the top part.

> When I'll find a bit of time, I will modify the current compiler do to this.

Look at my branch novel - it can probably serve as the starter (I
actually made a few steps in that direction, so that ImpCompiler is not
that far away).

> Cheers,
> Nico
>
>> Hello,
>>
>> this is a bit longer mail that goes a bit into details of the
>> alternative ImpCompiler and the actual status/problems. First part may
>> help also those who want to understand existing compiler.
>>
>>
>>
>> ImpCompiler is Amber compiler with different generation of code, but
>> reusing the same parser and AST as the existing compiler (I'll call it
>> FunCompiler).
>>
>>
>>
>> FunCompiler is build around expressions - every node (with exception
>> of a few ones, notable return, assignment and inline JS statment) are
>> compiled to produce a JS expression that produces the value of the
>> node. In particular, send is also assumed to return an expression,
>> like `32 foo` produces expression: `smalltalk.send((32), "_foo",
>> [])`. Compiler optimizations must obey the assumption and return
>> expression as well, so `true ifTrue: [whatever] ifFalse:
>> [somethingelse]` is inlined as `((..check if true is
>> Boolean..)?((true)?(function(){...compile
>> whatever...})():(function(){...compile
>> somethingelse...})()):smalltalk.send((true),"_ifTrue_ifFalse_",
>> [function(){...},function(){...}]))`. IIFEs are powerful, but alas,
>> you can't easily inline thing like `x ifTrue: [^42]` into `if ((..x is
>> Boolean..)) { if (x) return 42; } else {smalltalk.send(...)}`, you use
>> ?: with IIFEs which throw to do non-local return and you should catch
>> it.
>>
>>
>>
>> ImpCompiler is built as a sort-of clone of FunCompiler, with changing
>> the main assumption - instead of each node compiling to
>> value-producing expression, it compiles it into (lazy) assignment of
>> the value into (maybe internal) variable. So `32 foo: "bar"` gets
>> compiled into series of lazy assignments: `$0=32; $2="bar"; $1=[$2];
>> target=smalltalk.send($0,"_foo_",$1);`. There always must be a target
>> variable for the assignment of the value - it is the core of
>> ImpCompiler. The notion of target variable is a bit more abstract - it
>> can represent the variable to store the value into, but it can also be
>> blank (just do it, value is not needed - used for statements in method
>> and/or block), '^' (return the value using direct return) or '!'
>> (return the value from block using throw-catch mechanism).
>> If all assignments would left lazy, nothing would have been done, so
>> in fact, at some points, the values must be actually computed (that
>> is, code that computes them must be produced). There are basically two
>> such situations:
>> - before actual assignment - some variables are not lazy: the already
>> mentioned "just do it", "^" and "!" - they just need the value; also
>> any other "real" variable - like one used in smalltalk assignment
>> (assignment node just gets the variable to assign from LHS and lets
>> the RHS to be processed normally, just setting the "target" to the
>> variable-to-be-assigned; since it is not lazy, the actual JS code to
>> compute the value and assign it is produced).
>> - before if - whenever the code is to be inlined so that it uses if,
>> uncertainity appears and all the (free*) lazy values must be eagerly
>> evaluated (and stored to actual $n variables). Also, target of such
>> inlined node is made non-lazy - so in all branches of if, it is
>> assigned eagerly.
>>
>> In ImpCompiler, `true ifTrue: [^42]` produces series of lazy assignments:
>> `$0=true; "eager evaluation, 'if' is ahead"; if (...$0 is boolean...)
>> if ($0) { "return" = 42; "just do it" = nothing } else { "just do it"
>> = optional***(nil) } else { $3 = function() { "throw" = 42; "return" =
>> nothing }; $2 = [$3]; "just do it" = smalltalk.send($0, "_ifTrue_",
>> $2) }`
>> which ends up as
>> `$0=true; if (...$0 is boolean...) if ($0) { return 42; } else {} else
>> { smalltalk.send($0, "_ifTrue_", [function(){ throw $early=[42]; }])
>> }`
>> while marking $0 as "I need this internal variable declared as real,
>> since it is used in the code".
>>
>> (the "nothing" from the lazy assignments is the "expression value" of
>> return node - an implementation detail that lets process return node
>> similarly to expression / assigment nodes - no code is produces if
>> this "nothing" is assigned to any target)
>>
>>
>>
>> Status: ImpCompiler is basically working, it can process any amber
>> code and produce JS code for it. There are just a few stoppers:
>>
>> - variable name-clash and inlining: if names of variables between
>> method and block (or block and subblock) clash, this means
>> problem. Even more complicated is the issue of colliding variable in
>> the block with the "unknown" global variable in the method
>> (espoecially if the "unknown" appears later - compiler would need 2
>> passes to catch unknown variable names in first pass - it is problem
>> of FunCompiler as well, see issue 191 in github).
>> The solutions are:
>> - to use different name in block/subblock - the solution I would
>> embrace most cheerfully. Problem with internal renaming is, it
>> invalidates the names of variables to use with inline JS statement
>> blocks. I would have no problem with completely prohibiting inline JS
>> in blocks - but there certainly are people who would like to have it
>> there as well.**
>> - to check block for "safety" and if unsafe, simply IIFE it. This not
>> only increases complexity, but also is a bit ridiculous: if I want to
>> tweak a block to go really fast by adding a well-placed inline JS, it
>> is automatically unsafe, thus IIFEd, thus slower :-/
>> - saving/restoring the offending variables: it can save the day for
>> basic nameclash, but against unknown-block nameclash it is weak.
>>
>>
>>
>> For the curious, code of ImpCompiler can be found at
>> https://github.com/herby/amber/tree/newcompiler.
>>
>>
>>
>> Herby
>>
>>
>>
>> * if a lazy variables is used inside an expression, it is "bound",
>> that is, it is presumed to be used and needs no actual
>> materialization. Also, variable that is assigned immutable value, is
>> treated as bound (it is not eagerly evaluated and stored, it just
>> continues to be alias for its value).
>> ** The big case against inline JS in blocks in their inlining alone -
>> if a block is inlined, "return" means something different than when it
>> is not. Or it can be put the other way - there is simply no possible
>> way to inline blocks containing inline JS, because of "return".
>> *** optional appears a few times mainly not to clutter code with
>> unneeded things. There are assigned to any target except the blank
>> "just do it" where they are omitted. If there were '^' before the
>> ifTrue: example before, the code produced would be `$0=true; if (...$0
>> is boolean...) if ($0) { return 42; } else { return nil; } else {
>> return smalltalk.send($0, "_ifTrue_", [function(){ throw $early=[42];
>> }]) }`, so in this case nil would not be omitted (since its value is
>> wanted).