OpalCompiler evaluate speed

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

OpalCompiler evaluate speed

Nicolas Cellier
I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing and reevaluating all positve half precision float.

That's about 2^15 or approximately 32k loop which evaluate snippets like

    (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)

The test was naively written with Compiler evaluate: and was using the legacy Compiler.

If I rewrite self class compiler evaluate: the test times out.
Let's see what increase is necessary:

    [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
    -> 3s with legacy Compiler
    -> 14s with OpalCompiler

It's not unexpected that intermediate representation (IR) reification has a cost, but here the 4.5x is a bit too much...
This test did account for 1/4 of total test duration already (3s out of 12s).
With Opal, the total test duration doubles... (14s out of 23s)

So let's analyze the hot spot with:

    MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ].

(I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no matter since the primitives do not account that much, a MessageTally will do the job)

I first see a hot spot which does not seem that necessary:

      |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode

From the comments I understand that AST-based stuff requires a pattern (DoIt) and an explicit return (^), but this expensive formatting seems too much for just evaluating. i think that we should change that.

Then comes:

      |    |20.7% {2902ms} RBMethodNode>>generate:

which is split in two halves, ATS->IR and IR->bytecode

      |    |  |9.3% {1299ms} RBMethodNode>>generateIR
 
      |    |  |  |11.4% {1596ms} IRMethod>>generate:

But then I see this cost a 2nd time which also leave room for progress:

      |                |10.9% {1529ms} RBMethodNode>>generateIR

      |                |  |12.9% {1814ms} IRMethod>>generate:

The first is in RBMethodNode>>generateWithSource, the second in OpalCompiler>>compile

Last comes the parse time (sourceCode -> AST)

      |                  13.2% {1858ms} OpalCompiler>>parse

Along with semantic analysis

      |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis

-----------------------------------

For comparison, the legacy Compiler decomposes into:

      |        |61.5% {2223ms} Parser>>parse:class:category:noPattern:context:notifying:ifFail:

which more or less covers parse time + semantic analysis time.
That means that Opal does a fair work for this stage.

Then, the direct AST->byteCode phase is:

     |      16.9% {609ms} MethodNode>>generate

IR costs almost a 5x on this phase, but we know it's the price to pay for the additional features that it potentially offers. If only we would do it once...

And that's all for the legacy one...

--------------------------------------

This little exercize shows that a 2x acceleration of OpalCompiler evaluate seems achievable:
- simplify the uselessely expensive formatted code
- generate bytecodes once, not twice

Then it will be a bit more 2x slower than legacy, which is a better trade for yet to come superior features potentially brought by Opal.

It would be interesting to carry same analysis on method compilation
Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Nicolas Cellier


2017-11-21 14:19 GMT+01:00 Nicolas Cellier <[hidden email]>:
I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing and reevaluating all positve half precision float.

That's about 2^15 or approximately 32k loop which evaluate snippets like

    (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)

The test was naively written with Compiler evaluate: and was using the legacy Compiler.

If I rewrite self class compiler evaluate: the test times out.
Let's see what increase is necessary:

    [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
    -> 3s with legacy Compiler
    -> 14s with OpalCompiler

It's not unexpected that intermediate representation (IR) reification has a cost, but here the 4.5x is a bit too much...
This test did account for 1/4 of total test duration already (3s out of 12s).
With Opal, the total test duration doubles... (14s out of 23s)

So let's analyze the hot spot with:

    MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ].

(I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no matter since the primitives do not account that much, a MessageTally will do the job)

I first see a hot spot which does not seem that necessary:

      |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode

From the comments I understand that AST-based stuff requires a pattern (DoIt) and an explicit return (^), but this expensive formatting seems too much for just evaluating. i think that we should change that.

Then comes:

      |    |20.7% {2902ms} RBMethodNode>>generate:

which is split in two halves, ATS->IR and IR->bytecode

      |    |  |9.3% {1299ms} RBMethodNode>>generateIR
 
      |    |  |  |11.4% {1596ms} IRMethod>>generate:

But then I see this cost a 2nd time which also leave room for progress:

      |                |10.9% {1529ms} RBMethodNode>>generateIR

      |                |  |12.9% {1814ms} IRMethod>>generate:

The first is in RBMethodNode>>generateWithSource, the second in OpalCompiler>>compile

Last comes the parse time (sourceCode -> AST)

      |                  13.2% {1858ms} OpalCompiler>>parse

Along with semantic analysis

      |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis

-----------------------------------

For comparison, the legacy Compiler decomposes into:

      |        |61.5% {2223ms} Parser>>parse:class:category:noPattern:context:notifying:ifFail:

which more or less covers parse time + semantic analysis time.
That means that Opal does a fair work for this stage.

Then, the direct AST->byteCode phase is:

     |      16.9% {609ms} MethodNode>>generate

IR costs almost a 5x on this phase, but we know it's the price to pay for the additional features that it potentially offers. If only we would do it once...

And that's all for the legacy one...

--------------------------------------

This little exercize shows that a 2x acceleration of OpalCompiler evaluate seems achievable:
- simplify the uselessely expensive formatted code
- generate bytecodes once, not twice

Then it will be a bit more 2x slower than legacy, which is a better trade for yet to come superior features potentially brought by Opal.

It would be interesting to carry same analysis on method compilation
 
Digging further here is what I find:

compile sends generate: and answer a CompiledMethod
translate sends compile but throw the CompiledMethod away, and just answer the AST.
Most senders of translate will also generate: (thus we generate: twice quite often, loosing a 2x factor in compilation).

A 2x gain is a huge gain when installing big code bases, especially if the custom is to throw image away and reconstruct.
No matter if a bot does the job, it does it for twice many watts and at the end, we're waiting for the bot.

However, before changing anything, further clarification is required:
translate does one more thing, it catches ReparseAfterSourceEditing and retry compilation (once).
So my question: are there some cases when generate: will cause ReparseAfterSourceEditing?
That could happen in generation phase if some byte code limit is exceeded, and an interactive handling corrects code...
I did not see any such condition, but code base is huge...
Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Ben Coman


On 22 November 2017 at 05:49, Nicolas Cellier <[hidden email]> wrote:


2017-11-21 14:19 GMT+01:00 Nicolas Cellier <[hidden email]>:
I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing and reevaluating all positve half precision float.

That's about 2^15 or approximately 32k loop which evaluate snippets like

    (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)

The test was naively written with Compiler evaluate: and was using the legacy Compiler.

If I rewrite self class compiler evaluate: the test times out.
Let's see what increase is necessary:

    [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
    -> 3s with legacy Compiler
    -> 14s with OpalCompiler

It's not unexpected that intermediate representation (IR) reification has a cost, but here the 4.5x is a bit too much...
This test did account for 1/4 of total test duration already (3s out of 12s).
With Opal, the total test duration doubles... (14s out of 23s)

So let's analyze the hot spot with:

    MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ].

(I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no matter since the primitives do not account that much, a MessageTally will do the job)

I first see a hot spot which does not seem that necessary:

      |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode

From the comments I understand that AST-based stuff requires a pattern (DoIt) and an explicit return (^), but this expensive formatting seems too much for just evaluating. i think that we should change that.

Then comes:

      |    |20.7% {2902ms} RBMethodNode>>generate:

which is split in two halves, ATS->IR and IR->bytecode

      |    |  |9.3% {1299ms} RBMethodNode>>generateIR
 
      |    |  |  |11.4% {1596ms} IRMethod>>generate:

But then I see this cost a 2nd time which also leave room for progress:

      |                |10.9% {1529ms} RBMethodNode>>generateIR

      |                |  |12.9% {1814ms} IRMethod>>generate:

The first is in RBMethodNode>>generateWithSource, the second in OpalCompiler>>compile

Last comes the parse time (sourceCode -> AST)

      |                  13.2% {1858ms} OpalCompiler>>parse

Along with semantic analysis

      |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis

-----------------------------------

For comparison, the legacy Compiler decomposes into:

      |        |61.5% {2223ms} Parser>>parse:class:category:noPattern:context:notifying:ifFail:

which more or less covers parse time + semantic analysis time.
That means that Opal does a fair work for this stage.

Then, the direct AST->byteCode phase is:

     |      16.9% {609ms} MethodNode>>generate

IR costs almost a 5x on this phase, but we know it's the price to pay for the additional features that it potentially offers. If only we would do it once...

And that's all for the legacy one...

--------------------------------------

This little exercize shows that a 2x acceleration of OpalCompiler evaluate seems achievable:
- simplify the uselessely expensive formatted code
- generate bytecodes once, not twice

Then it will be a bit more 2x slower than legacy, which is a better trade for yet to come superior features potentially brought by Opal.

It would be interesting to carry same analysis on method compilation
 
Digging further here is what I find:

compile sends generate: and answer a CompiledMethod
translate sends compile but throw the CompiledMethod away, and just answer the AST.
Most senders of translate will also generate: (thus we generate: twice quite often, loosing a 2x factor in compilation).

A 2x gain is a huge gain when installing big code bases, especially if the custom is to throw image away and reconstruct.
No matter if a bot does the job, it does it for twice many watts and at the end, we're waiting for the bot.

However, before changing anything, further clarification is required:
translate does one more thing, it catches ReparseAfterSourceEditing and retry compilation (once).
So my question: are there some cases when generate: will cause ReparseAfterSourceEditing?

I don't know the full answer about other cases, but I can provide the background why ReparseAfterSourceEditing was added.

IIRC, a few years ago with the move to an AST based system, there was a problem with syntax highlighting where 
the AST referenced its original source which caused highlighting offsets when reference to source modified in the editor.  
Trying to work backwards from modified source to update all AST elements source-location proved an intractable problem. 
The workaround I found was to move only in a forward direction regenerating AST from source every keystroke.  
Performance was acceptable so this became the permanent solution.

I don't have access to an image to check, but you should find ReparseAfterSourceEditing raised only in one location near editor #changed: 
Maybe this should activate only for interactively modified code, and disabled/bypassed for bulk loading code.
For testing purposes commenting it out should not harm the system, just produce visual artifacts in syntax highlighting.

 
That could happen in generation phase if some byte code limit is exceeded, and an interactive handling corrects code...
I did not see any such condition, but code base is huge...

At worst, the impact should only be a temporary visual artifact. Corrected on the next keystroke.
(unless ReparseAfterSourceEditing has been adopted for other than original purpose, but I'd guess not)

cheers -ben

Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Stephane Ducasse-3
In reply to this post by Nicolas Cellier
Thanks this is good.
I think that Opal deserves more energy.
We are busy right now. Now marcus will certainly be interested.
It will also speed up part of the bootstrap.

Stef

On Tue, Nov 21, 2017 at 2:19 PM, Nicolas Cellier
<[hidden email]> wrote:

> I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing
> and reevaluating all positve half precision float.
>
> That's about 2^15 or approximately 32k loop which evaluate snippets like
>
>     (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)
>
> The test was naively written with Compiler evaluate: and was using the
> legacy Compiler.
>
> If I rewrite self class compiler evaluate: the test times out.
> Let's see what increase is necessary:
>
>     [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
>     -> 3s with legacy Compiler
>     -> 14s with OpalCompiler
>
> It's not unexpected that intermediate representation (IR) reification has a
> cost, but here the 4.5x is a bit too much...
> This test did account for 1/4 of total test duration already (3s out of
> 12s).
> With Opal, the total test duration doubles... (14s out of 23s)
>
> So let's analyze the hot spot with:
>
>     MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new
> testPrintAndEvaluate  ].
>
> (I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no
> matter since the primitives do not account that much, a MessageTally will do
> the job)
>
> I first see a hot spot which does not seem that necessary:
>
>       |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode
>
> From the comments I understand that AST-based stuff requires a pattern
> (DoIt) and an explicit return (^), but this expensive formatting seems too
> much for just evaluating. i think that we should change that.
>
> Then comes:
>
>       |    |20.7% {2902ms} RBMethodNode>>generate:
>
> which is split in two halves, ATS->IR and IR->bytecode
>
>       |    |  |9.3% {1299ms} RBMethodNode>>generateIR
>
>       |    |  |  |11.4% {1596ms} IRMethod>>generate:
>
> But then I see this cost a 2nd time which also leave room for progress:
>
>       |                |10.9% {1529ms} RBMethodNode>>generateIR
>
>       |                |  |12.9% {1814ms} IRMethod>>generate:
>
> The first is in RBMethodNode>>generateWithSource, the second in
> OpalCompiler>>compile
>
> Last comes the parse time (sourceCode -> AST)
>
>       |                  13.2% {1858ms} OpalCompiler>>parse
>
> Along with semantic analysis
>
>       |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis
>
> -----------------------------------
>
> For comparison, the legacy Compiler decomposes into:
>
>       |        |61.5% {2223ms}
> Parser>>parse:class:category:noPattern:context:notifying:ifFail:
>
> which more or less covers parse time + semantic analysis time.
> That means that Opal does a fair work for this stage.
>
> Then, the direct AST->byteCode phase is:
>
>      |      16.9% {609ms} MethodNode>>generate
>
> IR costs almost a 5x on this phase, but we know it's the price to pay for
> the additional features that it potentially offers. If only we would do it
> once...
>
> And that's all for the legacy one...
>
> --------------------------------------
>
> This little exercize shows that a 2x acceleration of OpalCompiler evaluate
> seems achievable:
> - simplify the uselessely expensive formatted code
> - generate bytecodes once, not twice
>
> Then it will be a bit more 2x slower than legacy, which is a better trade
> for yet to come superior features potentially brought by Opal.
>
> It would be interesting to carry same analysis on method compilation

Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Nicolas Cellier
In reply to this post by Ben Coman


2017-11-22 0:31 GMT+01:00 Ben Coman <[hidden email]>:


On 22 November 2017 at 05:49, Nicolas Cellier <[hidden email]> wrote:


2017-11-21 14:19 GMT+01:00 Nicolas Cellier <[hidden email]>:
I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing and reevaluating all positve half precision float.

That's about 2^15 or approximately 32k loop which evaluate snippets like

    (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)

The test was naively written with Compiler evaluate: and was using the legacy Compiler.

If I rewrite self class compiler evaluate: the test times out.
Let's see what increase is necessary:

    [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
    -> 3s with legacy Compiler
    -> 14s with OpalCompiler

It's not unexpected that intermediate representation (IR) reification has a cost, but here the 4.5x is a bit too much...
This test did account for 1/4 of total test duration already (3s out of 12s).
With Opal, the total test duration doubles... (14s out of 23s)

So let's analyze the hot spot with:

    MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ].

(I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no matter since the primitives do not account that much, a MessageTally will do the job)

I first see a hot spot which does not seem that necessary:

      |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode

From the comments I understand that AST-based stuff requires a pattern (DoIt) and an explicit return (^), but this expensive formatting seems too much for just evaluating. i think that we should change that.

Then comes:

      |    |20.7% {2902ms} RBMethodNode>>generate:

which is split in two halves, ATS->IR and IR->bytecode

      |    |  |9.3% {1299ms} RBMethodNode>>generateIR
 
      |    |  |  |11.4% {1596ms} IRMethod>>generate:

But then I see this cost a 2nd time which also leave room for progress:

      |                |10.9% {1529ms} RBMethodNode>>generateIR

      |                |  |12.9% {1814ms} IRMethod>>generate:

The first is in RBMethodNode>>generateWithSource, the second in OpalCompiler>>compile

Last comes the parse time (sourceCode -> AST)

      |                  13.2% {1858ms} OpalCompiler>>parse

Along with semantic analysis

      |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis

-----------------------------------

For comparison, the legacy Compiler decomposes into:

      |        |61.5% {2223ms} Parser>>parse:class:category:noPattern:context:notifying:ifFail:

which more or less covers parse time + semantic analysis time.
That means that Opal does a fair work for this stage.

Then, the direct AST->byteCode phase is:

     |      16.9% {609ms} MethodNode>>generate

IR costs almost a 5x on this phase, but we know it's the price to pay for the additional features that it potentially offers. If only we would do it once...

And that's all for the legacy one...

--------------------------------------

This little exercize shows that a 2x acceleration of OpalCompiler evaluate seems achievable:
- simplify the uselessely expensive formatted code
- generate bytecodes once, not twice

Then it will be a bit more 2x slower than legacy, which is a better trade for yet to come superior features potentially brought by Opal.

It would be interesting to carry same analysis on method compilation
 
Digging further here is what I find:

compile sends generate: and answer a CompiledMethod
translate sends compile but throw the CompiledMethod away, and just answer the AST.
Most senders of translate will also generate: (thus we generate: twice quite often, loosing a 2x factor in compilation).

A 2x gain is a huge gain when installing big code bases, especially if the custom is to throw image away and reconstruct.
No matter if a bot does the job, it does it for twice many watts and at the end, we're waiting for the bot.

However, before changing anything, further clarification is required:
translate does one more thing, it catches ReparseAfterSourceEditing and retry compilation (once).
So my question: are there some cases when generate: will cause ReparseAfterSourceEditing?

I don't know the full answer about other cases, but I can provide the background why ReparseAfterSourceEditing was added.

IIRC, a few years ago with the move to an AST based system, there was a problem with syntax highlighting where 
the AST referenced its original source which caused highlighting offsets when reference to source modified in the editor.  
Trying to work backwards from modified source to update all AST elements source-location proved an intractable problem. 
The workaround I found was to move only in a forward direction regenerating AST from source every keystroke.  
Performance was acceptable so this became the permanent solution.

I don't have access to an image to check, but you should find ReparseAfterSourceEditing raised only in one location near editor #changed: 
Maybe this should activate only for interactively modified code, and disabled/bypassed for bulk loading code.
For testing purposes commenting it out should not harm the system, just produce visual artifacts in syntax highlighting.

 
That could happen in generation phase if some byte code limit is exceeded, and an interactive handling corrects code...
I did not see any such condition, but code base is huge...

At worst, the impact should only be a temporary visual artifact. Corrected on the next keystroke.
(unless ReparseAfterSourceEditing has been adopted for other than original purpose, but I'd guess not)

cheers -ben

Hi Ben,
Thanks for information.
We must keep ReparseAfterSourceEditing, it does its job.

But it just sounds like we have an inversion:

translate (source code->AST) does call compile (source code->AST->bytecode in a CompiledMethod)

I would expect the other way around: if we want to compile, we need to translate first.
If we want to translate, we don't really need to compile, unless there's an hidden reason...
Thus my question: is there an hidden reason?
Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Clément Béra
Hi Nicolas.

Just some comments:

Another thing you can try is to remove the allocation of Opal's IR. It seems people use the IR only through the IRBuilder, so the API can be kept but it can generate directly bytecode instead of IR then bytecode. Removing those allocations would speed things up. It means merging IRFix / IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API directly call the new resulting merged class.

Another thing is that when Opal became the default compiler, I evaluated the speed and saw it was slower, but when loading large projects it seemed loading time was dominated by Monticello / source reading / source loading and the compilation time was overall not that significant (< 20% of time). I don't know if this is still the case with GIT. I have problems currently when editing some large methods (it seems in the IDE the method is compiled at each keystroke...) and when doing OpalCompiler recompileAll (which I do often since I edit bytecode sets) but else the performance of Opal seems to be OK. Evaluate performance may be relevant in some cases but I've never found such cases outside of the IDE in production.

Best !

Clement


On Thu, Nov 23, 2017 at 9:41 PM, Nicolas Cellier <[hidden email]> wrote:


2017-11-22 0:31 GMT+01:00 Ben Coman <[hidden email]>:


On 22 November 2017 at 05:49, Nicolas Cellier <[hidden email]> wrote:


2017-11-21 14:19 GMT+01:00 Nicolas Cellier <[hidden email]>:
I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing and reevaluating all positve half precision float.

That's about 2^15 or approximately 32k loop which evaluate snippets like

    (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)

The test was naively written with Compiler evaluate: and was using the legacy Compiler.

If I rewrite self class compiler evaluate: the test times out.
Let's see what increase is necessary:

    [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
    -> 3s with legacy Compiler
    -> 14s with OpalCompiler

It's not unexpected that intermediate representation (IR) reification has a cost, but here the 4.5x is a bit too much...
This test did account for 1/4 of total test duration already (3s out of 12s).
With Opal, the total test duration doubles... (14s out of 23s)

So let's analyze the hot spot with:

    MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ].

(I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no matter since the primitives do not account that much, a MessageTally will do the job)

I first see a hot spot which does not seem that necessary:

      |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode

From the comments I understand that AST-based stuff requires a pattern (DoIt) and an explicit return (^), but this expensive formatting seems too much for just evaluating. i think that we should change that.

Then comes:

      |    |20.7% {2902ms} RBMethodNode>>generate:

which is split in two halves, ATS->IR and IR->bytecode

      |    |  |9.3% {1299ms} RBMethodNode>>generateIR
 
      |    |  |  |11.4% {1596ms} IRMethod>>generate:

But then I see this cost a 2nd time which also leave room for progress:

      |                |10.9% {1529ms} RBMethodNode>>generateIR

      |                |  |12.9% {1814ms} IRMethod>>generate:

The first is in RBMethodNode>>generateWithSource, the second in OpalCompiler>>compile

Last comes the parse time (sourceCode -> AST)

      |                  13.2% {1858ms} OpalCompiler>>parse

Along with semantic analysis

      |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis

-----------------------------------

For comparison, the legacy Compiler decomposes into:

      |        |61.5% {2223ms} Parser>>parse:class:category:noPattern:context:notifying:ifFail:

which more or less covers parse time + semantic analysis time.
That means that Opal does a fair work for this stage.

Then, the direct AST->byteCode phase is:

     |      16.9% {609ms} MethodNode>>generate

IR costs almost a 5x on this phase, but we know it's the price to pay for the additional features that it potentially offers. If only we would do it once...

And that's all for the legacy one...

--------------------------------------

This little exercize shows that a 2x acceleration of OpalCompiler evaluate seems achievable:
- simplify the uselessely expensive formatted code
- generate bytecodes once, not twice

Then it will be a bit more 2x slower than legacy, which is a better trade for yet to come superior features potentially brought by Opal.

It would be interesting to carry same analysis on method compilation
 
Digging further here is what I find:

compile sends generate: and answer a CompiledMethod
translate sends compile but throw the CompiledMethod away, and just answer the AST.
Most senders of translate will also generate: (thus we generate: twice quite often, loosing a 2x factor in compilation).

A 2x gain is a huge gain when installing big code bases, especially if the custom is to throw image away and reconstruct.
No matter if a bot does the job, it does it for twice many watts and at the end, we're waiting for the bot.

However, before changing anything, further clarification is required:
translate does one more thing, it catches ReparseAfterSourceEditing and retry compilation (once).
So my question: are there some cases when generate: will cause ReparseAfterSourceEditing?

I don't know the full answer about other cases, but I can provide the background why ReparseAfterSourceEditing was added.

IIRC, a few years ago with the move to an AST based system, there was a problem with syntax highlighting where 
the AST referenced its original source which caused highlighting offsets when reference to source modified in the editor.  
Trying to work backwards from modified source to update all AST elements source-location proved an intractable problem. 
The workaround I found was to move only in a forward direction regenerating AST from source every keystroke.  
Performance was acceptable so this became the permanent solution.

I don't have access to an image to check, but you should find ReparseAfterSourceEditing raised only in one location near editor #changed: 
Maybe this should activate only for interactively modified code, and disabled/bypassed for bulk loading code.
For testing purposes commenting it out should not harm the system, just produce visual artifacts in syntax highlighting.

 
That could happen in generation phase if some byte code limit is exceeded, and an interactive handling corrects code...
I did not see any such condition, but code base is huge...

At worst, the impact should only be a temporary visual artifact. Corrected on the next keystroke.
(unless ReparseAfterSourceEditing has been adopted for other than original purpose, but I'd guess not)

cheers -ben

Hi Ben,
Thanks for information.
We must keep ReparseAfterSourceEditing, it does its job.

But it just sounds like we have an inversion:

translate (source code->AST) does call compile (source code->AST->bytecode in a CompiledMethod)

I would expect the other way around: if we want to compile, we need to translate first.
If we want to translate, we don't really need to compile, unless there's an hidden reason...
Thus my question: is there an hidden reason?



--
Clément Béra
Pharo consortium engineer
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Nicolas Cellier


2017-11-23 23:37 GMT+01:00 Clément Bera <[hidden email]>:
Hi Nicolas.

Just some comments:

Another thing you can try is to remove the allocation of Opal's IR. It seems people use the IR only through the IRBuilder, so the API can be kept but it can generate directly bytecode instead of IR then bytecode. Removing those allocations would speed things up. It means merging IRFix / IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API directly call the new resulting merged class.


Yes, I was thinking of this optimization: use IR only when we need advanced tools/instrumentation.

Another thing is that when Opal became the default compiler, I evaluated the speed and saw it was slower, but when loading large projects it seemed loading time was dominated by Monticello / source reading / source loading and the compilation time was overall not that significant (< 20% of time). I don't know if this is still the case with GIT. I have problems currently when editing some large methods (it seems in the IDE the method is compiled at each keystroke...) and when doing OpalCompiler recompileAll (which I do often since I edit bytecode sets) but else the performance of Opal seems to be OK. Evaluate performance may be relevant in some cases but I've never found such cases outside of the IDE in production.

Best !

Clement


Right, my case was generating the strings, so Compiler was the main contributor.
For more general case, it should be benched.
I was thinking of a tool for checking the example comments "some code example >>> some result".


On Thu, Nov 23, 2017 at 9:41 PM, Nicolas Cellier <[hidden email]> wrote:


2017-11-22 0:31 GMT+01:00 Ben Coman <[hidden email]>:


On 22 November 2017 at 05:49, Nicolas Cellier <[hidden email]> wrote:


2017-11-21 14:19 GMT+01:00 Nicolas Cellier <[hidden email]>:
I have an ArbitraryPrecisionFloatTests doing an exhaustive test for printing and reevaluating all positve half precision float.

That's about 2^15 or approximately 32k loop which evaluate snippets like

    (ArbitraryPrecisionFloat readFrom: '1.123' readStream numBits: 10)

The test was naively written with Compiler evaluate: and was using the legacy Compiler.

If I rewrite self class compiler evaluate: the test times out.
Let's see what increase is necessary:

    [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ] timeToRun.
    -> 3s with legacy Compiler
    -> 14s with OpalCompiler

It's not unexpected that intermediate representation (IR) reification has a cost, but here the 4.5x is a bit too much...
This test did account for 1/4 of total test duration already (3s out of 12s).
With Opal, the total test duration doubles... (14s out of 23s)

So let's analyze the hot spot with:

    MessageTally  spyOn: [ ArbitraryPrecisionFloatTest new testPrintAndEvaluate  ].

(I didn't use AndreasSystemProfiler becuase outputs seems a bit garbbled, no matter since the primitives do not account that much, a MessageTally will do the job)

I first see a hot spot which does not seem that necessary:

      |    |24.6% {3447ms} RBMethodNode(RBProgramNode)>>formattedCode

From the comments I understand that AST-based stuff requires a pattern (DoIt) and an explicit return (^), but this expensive formatting seems too much for just evaluating. i think that we should change that.

Then comes:

      |    |20.7% {2902ms} RBMethodNode>>generate:

which is split in two halves, ATS->IR and IR->bytecode

      |    |  |9.3% {1299ms} RBMethodNode>>generateIR
 
      |    |  |  |11.4% {1596ms} IRMethod>>generate:

But then I see this cost a 2nd time which also leave room for progress:

      |                |10.9% {1529ms} RBMethodNode>>generateIR

      |                |  |12.9% {1814ms} IRMethod>>generate:

The first is in RBMethodNode>>generateWithSource, the second in OpalCompiler>>compile

Last comes the parse time (sourceCode -> AST)

      |                  13.2% {1858ms} OpalCompiler>>parse

Along with semantic analysis

      |                  6.0% {837ms} OpalCompiler>>doSemanticAnalysis

-----------------------------------

For comparison, the legacy Compiler decomposes into:

      |        |61.5% {2223ms} Parser>>parse:class:category:noPattern:context:notifying:ifFail:

which more or less covers parse time + semantic analysis time.
That means that Opal does a fair work for this stage.

Then, the direct AST->byteCode phase is:

     |      16.9% {609ms} MethodNode>>generate

IR costs almost a 5x on this phase, but we know it's the price to pay for the additional features that it potentially offers. If only we would do it once...

And that's all for the legacy one...

--------------------------------------

This little exercize shows that a 2x acceleration of OpalCompiler evaluate seems achievable:
- simplify the uselessely expensive formatted code
- generate bytecodes once, not twice

Then it will be a bit more 2x slower than legacy, which is a better trade for yet to come superior features potentially brought by Opal.

It would be interesting to carry same analysis on method compilation
 
Digging further here is what I find:

compile sends generate: and answer a CompiledMethod
translate sends compile but throw the CompiledMethod away, and just answer the AST.
Most senders of translate will also generate: (thus we generate: twice quite often, loosing a 2x factor in compilation).

A 2x gain is a huge gain when installing big code bases, especially if the custom is to throw image away and reconstruct.
No matter if a bot does the job, it does it for twice many watts and at the end, we're waiting for the bot.

However, before changing anything, further clarification is required:
translate does one more thing, it catches ReparseAfterSourceEditing and retry compilation (once).
So my question: are there some cases when generate: will cause ReparseAfterSourceEditing?

I don't know the full answer about other cases, but I can provide the background why ReparseAfterSourceEditing was added.

IIRC, a few years ago with the move to an AST based system, there was a problem with syntax highlighting where 
the AST referenced its original source which caused highlighting offsets when reference to source modified in the editor.  
Trying to work backwards from modified source to update all AST elements source-location proved an intractable problem. 
The workaround I found was to move only in a forward direction regenerating AST from source every keystroke.  
Performance was acceptable so this became the permanent solution.

I don't have access to an image to check, but you should find ReparseAfterSourceEditing raised only in one location near editor #changed: 
Maybe this should activate only for interactively modified code, and disabled/bypassed for bulk loading code.
For testing purposes commenting it out should not harm the system, just produce visual artifacts in syntax highlighting.

 
That could happen in generation phase if some byte code limit is exceeded, and an interactive handling corrects code...
I did not see any such condition, but code base is huge...

At worst, the impact should only be a temporary visual artifact. Corrected on the next keystroke.
(unless ReparseAfterSourceEditing has been adopted for other than original purpose, but I'd guess not)

cheers -ben

Hi Ben,
Thanks for information.
We must keep ReparseAfterSourceEditing, it does its job.

But it just sounds like we have an inversion:

translate (source code->AST) does call compile (source code->AST->bytecode in a CompiledMethod)

I would expect the other way around: if we want to compile, we need to translate first.
If we want to translate, we don't really need to compile, unless there's an hidden reason...
Thus my question: is there an hidden reason?



--
Clément Béra
Pharo consortium engineer

Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Marcus Denker-4
In reply to this post by Stephane Ducasse-3


> On 22 Nov 2017, at 21:14, Stephane Ducasse <[hidden email]> wrote:
>
> Thanks this is good.
> I think that Opal deserves more energy.
> We are busy right now. Now marcus will certainly be interested.

Yes, sadly I am over capacity. I can not react (or even read) every email send to the mailinglist
anymore.

I have marked this to check for later.

        Marcus


Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Marcus Denker-4
In reply to this post by Clément Béra


> On 23 Nov 2017, at 23:37, Clément Bera <[hidden email]> wrote:
>
> Hi Nicolas.
>
> Just some comments:
>
> Another thing you can try is to remove the allocation of Opal's IR. It seems people use the IR only through the IRBuilder, so the API can be kept but it can generate directly bytecode instead of IR then bytecode. Removing those allocations would speed things up. It means merging IRFix / IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API directly call the new resulting merged class.
>
> Another thing is that when Opal became the default compiler, I evaluated the speed and saw it was slower, but when loading large projects it seemed loading time was dominated by Monticello / source reading / source loading and the compilation time was overall not that significant (< 20% of time). I don't know if this is still the case with GIT. I have problems currently when editing some large methods (it seems in the IDE the method is compiled at each keystroke...) and when doing OpalCompiler recompileAll (which I do often since I edit bytecode sets) but else the performance of Opal seems to be OK. Evaluate performance may be relevant in some cases but I've never found such cases outside of the IDE in production.
>

Yes, the IR came over from the existing design of the ClosureCompiler for Squeak that Opal is was based on initially… I liked it at first because I was looking into byte code
manipulation back then and for that it was quite nice.

But for compiling, it adds a layer that is not really needed. If I would do it again I would not use this IR level in the compiler but instead do something simpler and use
the BytecodeBuilder directly.

But as Clement notes: it is not that much time when doing compilation if you consider the whole use-case of compiling, so this seemed not that of a problem
 (e.g. for compiling the image, we were happy with a factor 2, I did not benchmark recently, but I think this is what we reached).

For the frontend (compile to AST + name analysis), it is fast enough to do syntax highlighting in the browser with it instead of a specialised parser, so that part is
quite ok.

One other thing that needs cleanup: the API/ OpalCompiler class. This tries to be compatible with the old API (e.g. failBlock) and got quite hacked to death over
time… we should clean that up. The complexity of #evaluate is partly due to that, I think.

        Marcus
Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Nicolas Cellier


2017-11-24 10:54 GMT+01:00 Marcus Denker <[hidden email]>:


> On 23 Nov 2017, at 23:37, Clément Bera <[hidden email]> wrote:
>
> Hi Nicolas.
>
> Just some comments:
>
> Another thing you can try is to remove the allocation of Opal's IR. It seems people use the IR only through the IRBuilder, so the API can be kept but it can generate directly bytecode instead of IR then bytecode. Removing those allocations would speed things up. It means merging IRFix / IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API directly call the new resulting merged class.
>
> Another thing is that when Opal became the default compiler, I evaluated the speed and saw it was slower, but when loading large projects it seemed loading time was dominated by Monticello / source reading / source loading and the compilation time was overall not that significant (< 20% of time). I don't know if this is still the case with GIT. I have problems currently when editing some large methods (it seems in the IDE the method is compiled at each keystroke...) and when doing OpalCompiler recompileAll (which I do often since I edit bytecode sets) but else the performance of Opal seems to be OK. Evaluate performance may be relevant in some cases but I've never found such cases outside of the IDE in production.
>

Yes, the IR came over from the existing design of the ClosureCompiler for Squeak that Opal is was based on initially… I liked it at first because I was looking into byte code
manipulation back then and for that it was quite nice.

But for compiling, it adds a layer that is not really needed. If I would do it again I would not use this IR level in the compiler but instead do something simpler and use
the BytecodeBuilder directly.

But as Clement notes: it is not that much time when doing compilation if you consider the whole use-case of compiling, so this seemed not that of a problem
 (e.g. for compiling the image, we were happy with a factor 2, I did not benchmark recently, but I think this is what we reached).

For the frontend (compile to AST + name analysis), it is fast enough to do syntax highlighting in the browser with it instead of a specialised parser, so that part is
quite ok.

Agree

One other thing that needs cleanup: the API/ OpalCompiler class. This tries to be compatible with the old API (e.g. failBlock) and got quite hacked to death over
time… we should clean that up. The complexity of #evaluate is partly due to that, I think.

        Marcus

Yes, that was exactly my feeling.

Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Stephane Ducasse-3
Could not we do a task force to clean the compiler?

We need a better infrastructure for the tool.

Stef

On Fri, Nov 24, 2017 at 11:35 AM, Nicolas Cellier
<[hidden email]> wrote:

>
>
> 2017-11-24 10:54 GMT+01:00 Marcus Denker <[hidden email]>:
>>
>>
>>
>> > On 23 Nov 2017, at 23:37, Clément Bera <[hidden email]> wrote:
>> >
>> > Hi Nicolas.
>> >
>> > Just some comments:
>> >
>> > Another thing you can try is to remove the allocation of Opal's IR. It
>> > seems people use the IR only through the IRBuilder, so the API can be kept
>> > but it can generate directly bytecode instead of IR then bytecode. Removing
>> > those allocations would speed things up. It means merging IRFix /
>> > IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API
>> > directly call the new resulting merged class.
>> >
>> > Another thing is that when Opal became the default compiler, I evaluated
>> > the speed and saw it was slower, but when loading large projects it seemed
>> > loading time was dominated by Monticello / source reading / source loading
>> > and the compilation time was overall not that significant (< 20% of time). I
>> > don't know if this is still the case with GIT. I have problems currently
>> > when editing some large methods (it seems in the IDE the method is compiled
>> > at each keystroke...) and when doing OpalCompiler recompileAll (which I do
>> > often since I edit bytecode sets) but else the performance of Opal seems to
>> > be OK. Evaluate performance may be relevant in some cases but I've never
>> > found such cases outside of the IDE in production.
>> >
>>
>> Yes, the IR came over from the existing design of the ClosureCompiler for
>> Squeak that Opal is was based on initially… I liked it at first because I
>> was looking into byte code
>> manipulation back then and for that it was quite nice.
>>
>> But for compiling, it adds a layer that is not really needed. If I would
>> do it again I would not use this IR level in the compiler but instead do
>> something simpler and use
>> the BytecodeBuilder directly.
>>
>> But as Clement notes: it is not that much time when doing compilation if
>> you consider the whole use-case of compiling, so this seemed not that of a
>> problem
>>  (e.g. for compiling the image, we were happy with a factor 2, I did not
>> benchmark recently, but I think this is what we reached).
>>
>> For the frontend (compile to AST + name analysis), it is fast enough to do
>> syntax highlighting in the browser with it instead of a specialised parser,
>> so that part is
>> quite ok.
>
>
> Agree
>>
>>
>> One other thing that needs cleanup: the API/ OpalCompiler class. This
>> tries to be compatible with the old API (e.g. failBlock) and got quite
>> hacked to death over
>> time… we should clean that up. The complexity of #evaluate is partly due
>> to that, I think.
>>
>>         Marcus
>
>
> Yes, that was exactly my feeling.
>

Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

Marcus Denker-4
In reply to this post by Marcus Denker-4


> On 24 Nov 2017, at 10:54, Marcus Denker <[hidden email]> wrote:
>
>
>
>> On 23 Nov 2017, at 23:37, Clément Bera <[hidden email]> wrote:
>>
>> Hi Nicolas.
>>
>> Just some comments:
>>
>> Another thing you can try is to remove the allocation of Opal's IR. It seems people use the IR only through the IRBuilder, so the API can be kept but it can generate directly bytecode instead of IR then bytecode. Removing those allocations would speed things up. It means merging IRFix / IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API directly call the new resulting merged class.
>>
>> Another thing is that when Opal became the default compiler, I evaluated the speed and saw it was slower, but when loading large projects it seemed loading time was dominated by Monticello / source reading / source loading and the compilation time was overall not that significant (< 20% of time). I don't know if this is still the case with GIT. I have problems currently when editing some large methods (it seems in the IDE the method is compiled at each keystroke...) and when doing OpalCompiler recompileAll (which I do often since I edit bytecode sets) but else the performance of Opal seems to be OK. Evaluate performance may be relevant in some cases but I've never found such cases outside of the IDE in production.
>>
>
(old mail… late answer).

In the meantime we found out that #recompileAll was recompiling meta-classes twice, with that fixed rempiling is a little bit faster (from around one minute to 50 seconds on my machine).
>
> One other thing that needs cleanup: the API/ OpalCompiler class. This tries to be compatible with the old API (e.g. failBlock) and got quite hacked to death over
> time… we should clean that up. The complexity of #evaluate is partly due to that, I think.
>

As a first step, I changed #evaluate to not store sources in the trailer, this means that we do not pretty-print the source anymore.

https://github.com/pharo-project/pharo/pull/880

The only downside is that a (not very practically relevant) test failed (#testSourceNodeOptimizedBlock). I think we can ignore that for now and fix it when it turns out to be
a problem (the test works fine when rewritten to compile a full method).

        Marcus



Reply | Threaded
Open this post in threaded view
|

Re: OpalCompiler evaluate speed

monty-3
That's great, Marcus. Lots of code depends on the compiler for dynamic evaluation, including compilers that translate other languages to ST (like the XPath lib), so a faster compiler isn't insignificant.

I know optimizing it takes time and probably isn't a high priority, but we could at least commit to not allowing it to get slower. For example, forget about replacing the RB parser with a PP-based one (which some have expressed interest in), since it looks like it would be slower, even with PP compiler: http://scg.unibe.ch/research/petitcompiler

> Sent: Tuesday, February 13, 2018 at 12:04 PM
> From: "Marcus Denker" <[hidden email]>
> To: "Pharo Development List" <[hidden email]>
> Subject: Re: [Pharo-dev] OpalCompiler evaluate speed
>
>
>
> > On 24 Nov 2017, at 10:54, Marcus Denker <[hidden email]> wrote:
> >
> >
> >
> >> On 23 Nov 2017, at 23:37, Clément Bera <[hidden email]> wrote:
> >>
> >> Hi Nicolas.
> >>
> >> Just some comments:
> >>
> >> Another thing you can try is to remove the allocation of Opal's IR. It seems people use the IR only through the IRBuilder, so the API can be kept but it can generate directly bytecode instead of IR then bytecode. Removing those allocations would speed things up. It means merging IRFix / IRTranslator / IRBytecodeGenerator somehow and have the IRBuilder API directly call the new resulting merged class.
> >>
> >> Another thing is that when Opal became the default compiler, I evaluated the speed and saw it was slower, but when loading large projects it seemed loading time was dominated by Monticello / source reading / source loading and the compilation time was overall not that significant (< 20% of time). I don't know if this is still the case with GIT. I have problems currently when editing some large methods (it seems in the IDE the method is compiled at each keystroke...) and when doing OpalCompiler recompileAll (which I do often since I edit bytecode sets) but else the performance of Opal seems to be OK. Evaluate performance may be relevant in some cases but I've never found such cases outside of the IDE in production.
> >>
> >
> (old mail… late answer).
>
> In the meantime we found out that #recompileAll was recompiling meta-classes twice, with that fixed rempiling is a little bit faster (from around one minute to 50 seconds on my machine).
> >
> > One other thing that needs cleanup: the API/ OpalCompiler class. This tries to be compatible with the old API (e.g. failBlock) and got quite hacked to death over
> > time… we should clean that up. The complexity of #evaluate is partly due to that, I think.
> >
>
> As a first step, I changed #evaluate to not store sources in the trailer, this means that we do not pretty-print the source anymore.
>
> https://github.com/pharo-project/pharo/pull/880
>
> The only downside is that a (not very practically relevant) test failed (#testSourceNodeOptimizedBlock). I think we can ignore that for now and fix it when it turns out to be
> a problem (the test works fine when rewritten to compile a full method).
>
> Marcus
>
>
>
>