A BlockClosure optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

A BlockClosure optimization

Igor Stasenko
 
Hello,

i just thought, that we could optimize a closure-copy mechanism
to reuse a closure (BlockClosure instance), which were created before
for same context.

A mechanism of optimization can illustrated by following code.

Suppose , you having a method, which using a loop:

myMethod

 1 to: 100 do: [:i |
   dict at: i ifAbsent: [ foo bar ] ]

The idea is to copy a closure from method's literal frame just once,
and store it into temp,
and then reuse it like following:

myMethod
| closure |
 1 to: 100 do: [:i |
   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]

----------

A simple benchmark shows that we could gain from it:

[ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
670

[
| closure |  closure := nil.
1000000 timesRepeat: [
        (closure ifNil: [ closure := [ 1+1] ]) value ]
] timeToRun
518

As you can see, even implemented in smalltalk (without any changes to
VM) it shows
a significant performance boost.

Of course, if we put something, which loads processor by real work,
instead of just [1+1],
the difference will be less significant.

But apart from this, lying the fact, that copying closure object each
time means memory allocation,
and hence leads to more frequent GC.




--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Eliot Miranda-2
 
Hi Igor,

On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:

Hello,

i just thought, that we could optimize a closure-copy mechanism
to reuse a closure (BlockClosure instance), which were created before
for same context.

    that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).

(VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).

 
A mechanism of optimization can illustrated by following code.

Suppose , you having a method, which using a loop:

myMethod

 1 to: 100 do: [:i |
  dict at: i ifAbsent: [ foo bar ] ]

The idea is to copy a closure from method's literal frame just once,
and store it into temp,
and then reuse it like following:

myMethod
| closure |
 1 to: 100 do: [:i |
  dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]

----------

A simple benchmark shows that we could gain from it:

[ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
670

[
| closure |  closure := nil.
1000000 timesRepeat: [
       (closure ifNil: [ closure := [ 1+1] ]) value ]
] timeToRun
518

As you can see, even implemented in smalltalk (without any changes to
VM) it shows
a significant performance boost.

That's what's nice about this optimization.  It doesn't require any VM modifications ;)

 
Of course, if we put something, which loads processor by real work,
instead of just [1+1],
the difference will be less significant.

But apart from this, lying the fact, that copying closure object each
time means memory allocation,
and hence leads to more frequent GC.

What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Igor Stasenko

On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:

>
> Hi Igor,
> On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> Hello,
>>
>> i just thought, that we could optimize a closure-copy mechanism
>> to reuse a closure (BlockClosure instance), which were created before
>> for same context.
>
>     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
> (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>
>
>>
>> A mechanism of optimization can illustrated by following code.
>>
>> Suppose , you having a method, which using a loop:
>>
>> myMethod
>>
>>  1 to: 100 do: [:i |
>>   dict at: i ifAbsent: [ foo bar ] ]
>>
>> The idea is to copy a closure from method's literal frame just once,
>> and store it into temp,
>> and then reuse it like following:
>>
>> myMethod
>> | closure |
>>  1 to: 100 do: [:i |
>>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>>
>> ----------
>>
>> A simple benchmark shows that we could gain from it:
>>
>> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> 670
>>
>> [
>> | closure |  closure := nil.
>> 1000000 timesRepeat: [
>>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> ] timeToRun
>> 518
>>
>> As you can see, even implemented in smalltalk (without any changes to
>> VM) it shows
>> a significant performance boost.
>
> That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>

Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
in a loop (a bytecode block which having a backwards jump)
and then add a temp and emit the bytecodes to (re)use that temp.

A potential VM modification could be to add a bytecode like
"pushBlockClosure: litIndex orReuseTemp: tempIndex ",
which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].

Too bad, this technique can't be used for a nested blocks, since they
have to use different 'outerContext'
per each activation of outer closure. :(

>>
>> Of course, if we put something, which loads processor by real work,
>> instead of just [1+1],
>> the difference will be less significant.
>>
>> But apart from this, lying the fact, that copying closure object each
>> time means memory allocation,
>> and hence leads to more frequent GC.
>
> What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.

I never went to exploring a bytecode emission code in
Encoder / MethodNode / BytecodeAgnosticMethodNode.
It is hellishly complex and its easy to get lost there. :)
But i could try.

>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Eliot Miranda-2
 


On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <[hidden email]> wrote:

On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:
>
> Hi Igor,
> On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> Hello,
>>
>> i just thought, that we could optimize a closure-copy mechanism
>> to reuse a closure (BlockClosure instance), which were created before
>> for same context.
>
>     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
> (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>
>
>>
>> A mechanism of optimization can illustrated by following code.
>>
>> Suppose , you having a method, which using a loop:
>>
>> myMethod
>>
>>  1 to: 100 do: [:i |
>>   dict at: i ifAbsent: [ foo bar ] ]
>>
>> The idea is to copy a closure from method's literal frame just once,
>> and store it into temp,
>> and then reuse it like following:
>>
>> myMethod
>> | closure |
>>  1 to: 100 do: [:i |
>>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>>
>> ----------
>>
>> A simple benchmark shows that we could gain from it:
>>
>> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> 670
>>
>> [
>> | closure |  closure := nil.
>> 1000000 timesRepeat: [
>>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> ] timeToRun
>> 518
>>
>> As you can see, even implemented in smalltalk (without any changes to
>> VM) it shows
>> a significant performance boost.
>
> That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>

Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
in a loop (a bytecode block which having a backwards jump)
and then add a temp and emit the bytecodes to (re)use that temp.

A potential VM modification could be to add a bytecode like
"pushBlockClosure: litIndex orReuseTemp: tempIndex ",
which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].

Too bad, this technique can't be used for a nested blocks, since they
have to use different 'outerContext'
per each activation of outer closure. :(

Is that true?  Isn't the real issue that if any niladic block contains an ^-return then that block any any blocks it is nested within cannot be shared?  It would be good to try and state the invariants in formal English before you implement.  The implementation will be much better as a result.


>>
>> Of course, if we put something, which loads processor by real work,
>> instead of just [1+1],
>> the difference will be less significant.
>>
>> But apart from this, lying the fact, that copying closure object each
>> time means memory allocation,
>> and hence leads to more frequent GC.
>
> What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.

I never went to exploring a bytecode emission code in
Encoder / MethodNode / BytecodeAgnosticMethodNode.
It is hellishly complex and its easy to get lost there. :)
But i could try.

That's not what I meant.  I simply meant that you need to measure the effect of your optimization (caching blocks used in loops in a temp) on real use cases such as e.g. recompiling the entire system, which is a decent benchmark.


>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.


--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Igor Stasenko

On 15 May 2010 20:12, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:
>> >
>> > Hi Igor,
>> > On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> Hello,
>> >>
>> >> i just thought, that we could optimize a closure-copy mechanism
>> >> to reuse a closure (BlockClosure instance), which were created before
>> >> for same context.
>> >
>> >     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
>> > (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>> >
>> >
>> >>
>> >> A mechanism of optimization can illustrated by following code.
>> >>
>> >> Suppose , you having a method, which using a loop:
>> >>
>> >> myMethod
>> >>
>> >>  1 to: 100 do: [:i |
>> >>   dict at: i ifAbsent: [ foo bar ] ]
>> >>
>> >> The idea is to copy a closure from method's literal frame just once,
>> >> and store it into temp,
>> >> and then reuse it like following:
>> >>
>> >> myMethod
>> >> | closure |
>> >>  1 to: 100 do: [:i |
>> >>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>> >>
>> >> ----------
>> >>
>> >> A simple benchmark shows that we could gain from it:
>> >>
>> >> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> >> 670
>> >>
>> >> [
>> >> | closure |  closure := nil.
>> >> 1000000 timesRepeat: [
>> >>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> >> ] timeToRun
>> >> 518
>> >>
>> >> As you can see, even implemented in smalltalk (without any changes to
>> >> VM) it shows
>> >> a significant performance boost.
>> >
>> > That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>> >
>>
>> Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
>> in a loop (a bytecode block which having a backwards jump)
>> and then add a temp and emit the bytecodes to (re)use that temp.
>>
>> A potential VM modification could be to add a bytecode like
>> "pushBlockClosure: litIndex orReuseTemp: tempIndex ",
>> which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].
>>
>> Too bad, this technique can't be used for a nested blocks, since they
>> have to use different 'outerContext'
>> per each activation of outer closure. :(
>
> Is that true?  Isn't the real issue that if any niladic block contains an ^-return then that block any any blocks it is nested within cannot be shared?  It would be good to try and state the invariants in formal English before you implement.  The implementation will be much better as a result.

My English is not good enough to explain what i mean :)

I can show this with a code:

1 to: 1000 do: [ :i |    dict at: i ifAbsent: [  | x |  x := self foo:
i. dict2 at: i ifAbsent: [ x ] ] ]

here, the outer closure "[  | x | ... ]" can use caching.
While inner " [ x ] " can't, because its using a temp, created in an
outer block.
This is because each activation of outer block creating a new context.

Sure thing, for blocks which don't refer to the state of outer
context, it still should be possible.
And even if nested block refers to the state of method's context (such
as self or its temps):

1 to: 1000 do: [ :i |    dict at: i ifAbsent: [ dict2 at: i ifAbsent:
[ self foo: i ] ] ]

could be written as:

| {i} closureInner closureOuter |

closureInner := [self foo: i].
closureOuter := [ dict2 at: i ifAbsent: closureInner ].
1 to: 1000 do: [ :i |  dict at: i ifAbsent: closureOuter ].


yes, i know , that the above is incorrect code,
but take into account that temp 'i' actually moved to method's scope,
because #to:do: loop is inlined by compiler,
hence i placed {i}  to indicate that.


>>
>> >>
>> >> Of course, if we put something, which loads processor by real work,
>> >> instead of just [1+1],
>> >> the difference will be less significant.
>> >>
>> >> But apart from this, lying the fact, that copying closure object each
>> >> time means memory allocation,
>> >> and hence leads to more frequent GC.
>> >
>> > What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.
>>
>> I never went to exploring a bytecode emission code in
>> Encoder / MethodNode / BytecodeAgnosticMethodNode.
>> It is hellishly complex and its easy to get lost there. :)
>> But i could try.
>
> That's not what I meant.  I simply meant that you need to measure the effect of your optimization (caching blocks used in loops in a temp) on real use cases such as e.g. recompiling the entire system, which is a decent benchmark.

Please, elaborate, what you mean then. Do you mean, manually rewrite
the code in appropriate places
in Compiler toolchain, where its possible?

>>
>> >>
>> >> --
>> >> Best regards,
>> >> Igor Stasenko AKA sig.
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Eliot Miranda-2
 


On Sat, May 15, 2010 at 11:02 AM, Igor Stasenko <[hidden email]> wrote:

On 15 May 2010 20:12, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:
>> >
>> > Hi Igor,
>> > On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> Hello,
>> >>
>> >> i just thought, that we could optimize a closure-copy mechanism
>> >> to reuse a closure (BlockClosure instance), which were created before
>> >> for same context.
>> >
>> >     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
>> > (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>> >
>> >
>> >>
>> >> A mechanism of optimization can illustrated by following code.
>> >>
>> >> Suppose , you having a method, which using a loop:
>> >>
>> >> myMethod
>> >>
>> >>  1 to: 100 do: [:i |
>> >>   dict at: i ifAbsent: [ foo bar ] ]
>> >>
>> >> The idea is to copy a closure from method's literal frame just once,
>> >> and store it into temp,
>> >> and then reuse it like following:
>> >>
>> >> myMethod
>> >> | closure |
>> >>  1 to: 100 do: [:i |
>> >>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>> >>
>> >> ----------
>> >>
>> >> A simple benchmark shows that we could gain from it:
>> >>
>> >> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> >> 670
>> >>
>> >> [
>> >> | closure |  closure := nil.
>> >> 1000000 timesRepeat: [
>> >>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> >> ] timeToRun
>> >> 518
>> >>
>> >> As you can see, even implemented in smalltalk (without any changes to
>> >> VM) it shows
>> >> a significant performance boost.
>> >
>> > That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>> >
>>
>> Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
>> in a loop (a bytecode block which having a backwards jump)
>> and then add a temp and emit the bytecodes to (re)use that temp.
>>
>> A potential VM modification could be to add a bytecode like
>> "pushBlockClosure: litIndex orReuseTemp: tempIndex ",
>> which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].
>>
>> Too bad, this technique can't be used for a nested blocks, since they
>> have to use different 'outerContext'
>> per each activation of outer closure. :(
>
> Is that true?  Isn't the real issue that if any niladic block contains an ^-return then that block any any blocks it is nested within cannot be shared?  It would be good to try and state the invariants in formal English before you implement.  The implementation will be much better as a result.

My English is not good enough to explain what i mean :)

I can show this with a code:

1 to: 1000 do: [ :i |    dict at: i ifAbsent: [  | x |  x := self foo:
i. dict2 at: i ifAbsent: [ x ] ] ]

here, the outer closure "[  | x | ... ]" can use caching.
While inner " [ x ] " can't, because its using a temp, created in an
outer block.
This is because each activation of outer block creating a new context.

Sure thing, for blocks which don't refer to the state of outer
context, it still should be possible.
And even if nested block refers to the state of method's context (such
as self or its temps):

1 to: 1000 do: [ :i |    dict at: i ifAbsent: [ dict2 at: i ifAbsent:
[ self foo: i ] ] ]

could be written as:

| {i} closureInner closureOuter |

closureInner := [self foo: i].
closureOuter := [ dict2 at: i ifAbsent: closureInner ].
1 to: 1000 do: [ :i |  dict at: i ifAbsent: closureOuter ].


yes, i know , that the above is incorrect code,
but take into account that temp 'i' actually moved to method's scope,
because #to:do: loop is inlined by compiler,
hence i placed {i}  to indicate that.


>>
>> >>
>> >> Of course, if we put something, which loads processor by real work,
>> >> instead of just [1+1],
>> >> the difference will be less significant.
>> >>
>> >> But apart from this, lying the fact, that copying closure object each
>> >> time means memory allocation,
>> >> and hence leads to more frequent GC.
>> >
>> > What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.
>>
>> I never went to exploring a bytecode emission code in
>> Encoder / MethodNode / BytecodeAgnosticMethodNode.
>> It is hellishly complex and its easy to get lost there. :)
>> But i could try.
>
> That's not what I meant.  I simply meant that you need to measure the effect of your optimization (caching blocks used in loops in a temp) on real use cases such as e.g. recompiling the entire system, which is a decent benchmark.

Please, elaborate, what you mean then. Do you mean, manually rewrite
the code in appropriate places
in Compiler toolchain, where its possible?

No  I mean measure how long it takes to recompile the system *without* your optimization, then recompile the system using your optimization, then measure how long it takes to recompile the system *with* your optimization.  i.e. I mean measure how much effect your optimization has on real benchmarks.  Then we can tell if it is worth using in general.

best
Eliot


>>
>> >>
>> >> --
>> >> Best regards,
>> >> Igor Stasenko AKA sig.
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>
>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Igor Stasenko

On 16 May 2010 04:33, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Sat, May 15, 2010 at 11:02 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 15 May 2010 20:12, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:
>> >> >
>> >> > Hi Igor,
>> >> > On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>> >> >>
>> >> >> Hello,
>> >> >>
>> >> >> i just thought, that we could optimize a closure-copy mechanism
>> >> >> to reuse a closure (BlockClosure instance), which were created before
>> >> >> for same context.
>> >> >
>> >> >     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
>> >> > (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>> >> >
>> >> >
>> >> >>
>> >> >> A mechanism of optimization can illustrated by following code.
>> >> >>
>> >> >> Suppose , you having a method, which using a loop:
>> >> >>
>> >> >> myMethod
>> >> >>
>> >> >>  1 to: 100 do: [:i |
>> >> >>   dict at: i ifAbsent: [ foo bar ] ]
>> >> >>
>> >> >> The idea is to copy a closure from method's literal frame just once,
>> >> >> and store it into temp,
>> >> >> and then reuse it like following:
>> >> >>
>> >> >> myMethod
>> >> >> | closure |
>> >> >>  1 to: 100 do: [:i |
>> >> >>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>> >> >>
>> >> >> ----------
>> >> >>
>> >> >> A simple benchmark shows that we could gain from it:
>> >> >>
>> >> >> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> >> >> 670
>> >> >>
>> >> >> [
>> >> >> | closure |  closure := nil.
>> >> >> 1000000 timesRepeat: [
>> >> >>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> >> >> ] timeToRun
>> >> >> 518
>> >> >>
>> >> >> As you can see, even implemented in smalltalk (without any changes to
>> >> >> VM) it shows
>> >> >> a significant performance boost.
>> >> >
>> >> > That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>> >> >
>> >>
>> >> Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
>> >> in a loop (a bytecode block which having a backwards jump)
>> >> and then add a temp and emit the bytecodes to (re)use that temp.
>> >>
>> >> A potential VM modification could be to add a bytecode like
>> >> "pushBlockClosure: litIndex orReuseTemp: tempIndex ",
>> >> which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].
>> >>
>> >> Too bad, this technique can't be used for a nested blocks, since they
>> >> have to use different 'outerContext'
>> >> per each activation of outer closure. :(
>> >
>> > Is that true?  Isn't the real issue that if any niladic block contains an ^-return then that block any any blocks it is nested within cannot be shared?  It would be good to try and state the invariants in formal English before you implement.  The implementation will be much better as a result.
>>
>> My English is not good enough to explain what i mean :)
>>
>> I can show this with a code:
>>
>> 1 to: 1000 do: [ :i |    dict at: i ifAbsent: [  | x |  x := self foo:
>> i. dict2 at: i ifAbsent: [ x ] ] ]
>>
>> here, the outer closure "[  | x | ... ]" can use caching.
>> While inner " [ x ] " can't, because its using a temp, created in an
>> outer block.
>> This is because each activation of outer block creating a new context.
>>
>> Sure thing, for blocks which don't refer to the state of outer
>> context, it still should be possible.
>> And even if nested block refers to the state of method's context (such
>> as self or its temps):
>>
>> 1 to: 1000 do: [ :i |    dict at: i ifAbsent: [ dict2 at: i ifAbsent:
>> [ self foo: i ] ] ]
>>
>> could be written as:
>>
>> | {i} closureInner closureOuter |
>>
>> closureInner := [self foo: i].
>> closureOuter := [ dict2 at: i ifAbsent: closureInner ].
>> 1 to: 1000 do: [ :i |  dict at: i ifAbsent: closureOuter ].
>>
>>
>> yes, i know , that the above is incorrect code,
>> but take into account that temp 'i' actually moved to method's scope,
>> because #to:do: loop is inlined by compiler,
>> hence i placed {i}  to indicate that.
>>
>>
>> >>
>> >> >>
>> >> >> Of course, if we put something, which loads processor by real work,
>> >> >> instead of just [1+1],
>> >> >> the difference will be less significant.
>> >> >>
>> >> >> But apart from this, lying the fact, that copying closure object each
>> >> >> time means memory allocation,
>> >> >> and hence leads to more frequent GC.
>> >> >
>> >> > What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.
>> >>
>> >> I never went to exploring a bytecode emission code in
>> >> Encoder / MethodNode / BytecodeAgnosticMethodNode.
>> >> It is hellishly complex and its easy to get lost there. :)
>> >> But i could try.
>> >
>> > That's not what I meant.  I simply meant that you need to measure the effect of your optimization (caching blocks used in loops in a temp) on real use cases such as e.g. recompiling the entire system, which is a decent benchmark.
>>
>> Please, elaborate, what you mean then. Do you mean, manually rewrite
>> the code in appropriate places
>> in Compiler toolchain, where its possible?
>
> No  I mean measure how long it takes to recompile the system *without* your optimization, then recompile the system using your optimization, then measure how long it takes to recompile the system *with* your optimization.  i.e. I mean measure how much effect your optimization has on real benchmarks.  Then we can tell if it is worth using in general.

right, but in order to "recompile the system using my optimization" i
need to implement it first :)

And it sounds like you believe that it can be easily added, if so ,
then please give me a clues, where i should start from.
Maybe i am looking at wrong place, or at same place (as a whole), but
wrong section.

> best
> Eliot
>>

>
>



--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Eliot Miranda-2
 


On Sat, May 15, 2010 at 7:16 PM, Igor Stasenko <[hidden email]> wrote:

On 16 May 2010 04:33, Eliot Miranda <[hidden email]> wrote:
>
>
>
> On Sat, May 15, 2010 at 11:02 AM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 15 May 2010 20:12, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:
>> >> >
>> >> > Hi Igor,
>> >> > On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>> >> >>
>> >> >> Hello,
>> >> >>
>> >> >> i just thought, that we could optimize a closure-copy mechanism
>> >> >> to reuse a closure (BlockClosure instance), which were created before
>> >> >> for same context.
>> >> >
>> >> >     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
>> >> > (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>> >> >
>> >> >
>> >> >>
>> >> >> A mechanism of optimization can illustrated by following code.
>> >> >>
>> >> >> Suppose , you having a method, which using a loop:
>> >> >>
>> >> >> myMethod
>> >> >>
>> >> >>  1 to: 100 do: [:i |
>> >> >>   dict at: i ifAbsent: [ foo bar ] ]
>> >> >>
>> >> >> The idea is to copy a closure from method's literal frame just once,
>> >> >> and store it into temp,
>> >> >> and then reuse it like following:
>> >> >>
>> >> >> myMethod
>> >> >> | closure |
>> >> >>  1 to: 100 do: [:i |
>> >> >>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>> >> >>
>> >> >> ----------
>> >> >>
>> >> >> A simple benchmark shows that we could gain from it:
>> >> >>
>> >> >> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> >> >> 670
>> >> >>
>> >> >> [
>> >> >> | closure |  closure := nil.
>> >> >> 1000000 timesRepeat: [
>> >> >>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> >> >> ] timeToRun
>> >> >> 518
>> >> >>
>> >> >> As you can see, even implemented in smalltalk (without any changes to
>> >> >> VM) it shows
>> >> >> a significant performance boost.
>> >> >
>> >> > That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>> >> >
>> >>
>> >> Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
>> >> in a loop (a bytecode block which having a backwards jump)
>> >> and then add a temp and emit the bytecodes to (re)use that temp.
>> >>
>> >> A potential VM modification could be to add a bytecode like
>> >> "pushBlockClosure: litIndex orReuseTemp: tempIndex ",
>> >> which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].
>> >>
>> >> Too bad, this technique can't be used for a nested blocks, since they
>> >> have to use different 'outerContext'
>> >> per each activation of outer closure. :(
>> >
>> > Is that true?  Isn't the real issue that if any niladic block contains an ^-return then that block any any blocks it is nested within cannot be shared?  It would be good to try and state the invariants in formal English before you implement.  The implementation will be much better as a result.
>>
>> My English is not good enough to explain what i mean :)
>>
>> I can show this with a code:
>>
>> 1 to: 1000 do: [ :i |    dict at: i ifAbsent: [  | x |  x := self foo:
>> i. dict2 at: i ifAbsent: [ x ] ] ]
>>
>> here, the outer closure "[  | x | ... ]" can use caching.
>> While inner " [ x ] " can't, because its using a temp, created in an
>> outer block.
>> This is because each activation of outer block creating a new context.
>>
>> Sure thing, for blocks which don't refer to the state of outer
>> context, it still should be possible.
>> And even if nested block refers to the state of method's context (such
>> as self or its temps):
>>
>> 1 to: 1000 do: [ :i |    dict at: i ifAbsent: [ dict2 at: i ifAbsent:
>> [ self foo: i ] ] ]
>>
>> could be written as:
>>
>> | {i} closureInner closureOuter |
>>
>> closureInner := [self foo: i].
>> closureOuter := [ dict2 at: i ifAbsent: closureInner ].
>> 1 to: 1000 do: [ :i |  dict at: i ifAbsent: closureOuter ].
>>
>>
>> yes, i know , that the above is incorrect code,
>> but take into account that temp 'i' actually moved to method's scope,
>> because #to:do: loop is inlined by compiler,
>> hence i placed {i}  to indicate that.
>>
>>
>> >>
>> >> >>
>> >> >> Of course, if we put something, which loads processor by real work,
>> >> >> instead of just [1+1],
>> >> >> the difference will be less significant.
>> >> >>
>> >> >> But apart from this, lying the fact, that copying closure object each
>> >> >> time means memory allocation,
>> >> >> and hence leads to more frequent GC.
>> >> >
>> >> > What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.
>> >>
>> >> I never went to exploring a bytecode emission code in
>> >> Encoder / MethodNode / BytecodeAgnosticMethodNode.
>> >> It is hellishly complex and its easy to get lost there. :)
>> >> But i could try.
>> >
>> > That's not what I meant.  I simply meant that you need to measure the effect of your optimization (caching blocks used in loops in a temp) on real use cases such as e.g. recompiling the entire system, which is a decent benchmark.
>>
>> Please, elaborate, what you mean then. Do you mean, manually rewrite
>> the code in appropriate places
>> in Compiler toolchain, where its possible?
>
> No  I mean measure how long it takes to recompile the system *without* your optimization, then recompile the system using your optimization, then measure how long it takes to recompile the system *with* your optimization.  i.e. I mean measure how much effect your optimization has on real benchmarks.  Then we can tell if it is worth using in general.

right, but in order to "recompile the system using my optimization" i
need to implement it first :)

And it sounds like you believe that it can be easily added, if so ,
then please give me a clues, where i should start from.
Maybe i am looking at wrong place, or at same place (as a whole), but
wrong section.

Well, it needs to be part of the closure analysis which starts in BytecodeAgnosticMethodNode>>generate: where it sends self ensureClosureAnalysisDone.  To understand the closure analysis read my blog post and the code.  The blog post is a little out of date now but things haven't changed too much.  There's a long comment in BlockNode>>analyseArguments:temporaries:rootNode:.  To avoid wrecking the compiler while you work on it consider cloning it, using e.g. Encoder>>globalSourceRanges (which is used in Environment>>rewriteSourceForSelector:inClass:using:) to rename the relevant globals.  You can see the use of fixTemp: and BytecodeAgnosticMethodNode>>primitiveErrorVariableName for how to allocate a temp (provided there are spares available).

There are other options (the NewCompiler, OMeta etc) but the above amy be the most direct.
 

> best
> Eliot
>>

>
>



--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Igor Stasenko

On 16 May 2010 07:14, Eliot Miranda <[hidden email]> wrote:

>
>
>
> On Sat, May 15, 2010 at 7:16 PM, Igor Stasenko <[hidden email]> wrote:
>>
>> On 16 May 2010 04:33, Eliot Miranda <[hidden email]> wrote:
>> >
>> >
>> >
>> > On Sat, May 15, 2010 at 11:02 AM, Igor Stasenko <[hidden email]> wrote:
>> >>
>> >> On 15 May 2010 20:12, Eliot Miranda <[hidden email]> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <[hidden email]> wrote:
>> >> >>
>> >> >> On 15 May 2010 19:35, Eliot Miranda <[hidden email]> wrote:
>> >> >> >
>> >> >> > Hi Igor,
>> >> >> > On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <[hidden email]> wrote:
>> >> >> >>
>> >> >> >> Hello,
>> >> >> >>
>> >> >> >> i just thought, that we could optimize a closure-copy mechanism
>> >> >> >> to reuse a closure (BlockClosure instance), which were created before
>> >> >> >> for same context.
>> >> >> >
>> >> >> >     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
>> >> >> > (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >> A mechanism of optimization can illustrated by following code.
>> >> >> >>
>> >> >> >> Suppose , you having a method, which using a loop:
>> >> >> >>
>> >> >> >> myMethod
>> >> >> >>
>> >> >> >>  1 to: 100 do: [:i |
>> >> >> >>   dict at: i ifAbsent: [ foo bar ] ]
>> >> >> >>
>> >> >> >> The idea is to copy a closure from method's literal frame just once,
>> >> >> >> and store it into temp,
>> >> >> >> and then reuse it like following:
>> >> >> >>
>> >> >> >> myMethod
>> >> >> >> | closure |
>> >> >> >>  1 to: 100 do: [:i |
>> >> >> >>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>> >> >> >>
>> >> >> >> ----------
>> >> >> >>
>> >> >> >> A simple benchmark shows that we could gain from it:
>> >> >> >>
>> >> >> >> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> >> >> >> 670
>> >> >> >>
>> >> >> >> [
>> >> >> >> | closure |  closure := nil.
>> >> >> >> 1000000 timesRepeat: [
>> >> >> >>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> >> >> >> ] timeToRun
>> >> >> >> 518
>> >> >> >>
>> >> >> >> As you can see, even implemented in smalltalk (without any changes to
>> >> >> >> VM) it shows
>> >> >> >> a significant performance boost.
>> >> >> >
>> >> >> > That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>> >> >> >
>> >> >>
>> >> >> Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
>> >> >> in a loop (a bytecode block which having a backwards jump)
>> >> >> and then add a temp and emit the bytecodes to (re)use that temp.
>> >> >>
>> >> >> A potential VM modification could be to add a bytecode like
>> >> >> "pushBlockClosure: litIndex orReuseTemp: tempIndex ",
>> >> >> which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].
>> >> >>
>> >> >> Too bad, this technique can't be used for a nested blocks, since they
>> >> >> have to use different 'outerContext'
>> >> >> per each activation of outer closure. :(
>> >> >
>> >> > Is that true?  Isn't the real issue that if any niladic block contains an ^-return then that block any any blocks it is nested within cannot be shared?  It would be good to try and state the invariants in formal English before you implement.  The implementation will be much better as a result.
>> >>
>> >> My English is not good enough to explain what i mean :)
>> >>
>> >> I can show this with a code:
>> >>
>> >> 1 to: 1000 do: [ :i |    dict at: i ifAbsent: [  | x |  x := self foo:
>> >> i. dict2 at: i ifAbsent: [ x ] ] ]
>> >>
>> >> here, the outer closure "[  | x | ... ]" can use caching.
>> >> While inner " [ x ] " can't, because its using a temp, created in an
>> >> outer block.
>> >> This is because each activation of outer block creating a new context.
>> >>
>> >> Sure thing, for blocks which don't refer to the state of outer
>> >> context, it still should be possible.
>> >> And even if nested block refers to the state of method's context (such
>> >> as self or its temps):
>> >>
>> >> 1 to: 1000 do: [ :i |    dict at: i ifAbsent: [ dict2 at: i ifAbsent:
>> >> [ self foo: i ] ] ]
>> >>
>> >> could be written as:
>> >>
>> >> | {i} closureInner closureOuter |
>> >>
>> >> closureInner := [self foo: i].
>> >> closureOuter := [ dict2 at: i ifAbsent: closureInner ].
>> >> 1 to: 1000 do: [ :i |  dict at: i ifAbsent: closureOuter ].
>> >>
>> >>
>> >> yes, i know , that the above is incorrect code,
>> >> but take into account that temp 'i' actually moved to method's scope,
>> >> because #to:do: loop is inlined by compiler,
>> >> hence i placed {i}  to indicate that.
>> >>
>> >>
>> >> >>
>> >> >> >>
>> >> >> >> Of course, if we put something, which loads processor by real work,
>> >> >> >> instead of just [1+1],
>> >> >> >> the difference will be less significant.
>> >> >> >>
>> >> >> >> But apart from this, lying the fact, that copying closure object each
>> >> >> >> time means memory allocation,
>> >> >> >> and hence leads to more frequent GC.
>> >> >> >
>> >> >> > What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.
>> >> >>
>> >> >> I never went to exploring a bytecode emission code in
>> >> >> Encoder / MethodNode / BytecodeAgnosticMethodNode.
>> >> >> It is hellishly complex and its easy to get lost there. :)
>> >> >> But i could try.
>> >> >
>> >> > That's not what I meant.  I simply meant that you need to measure the effect of your optimization (caching blocks used in loops in a temp) on real use cases such as e.g. recompiling the entire system, which is a decent benchmark.
>> >>
>> >> Please, elaborate, what you mean then. Do you mean, manually rewrite
>> >> the code in appropriate places
>> >> in Compiler toolchain, where its possible?
>> >
>> > No  I mean measure how long it takes to recompile the system *without* your optimization, then recompile the system using your optimization, then measure how long it takes to recompile the system *with* your optimization.  i.e. I mean measure how much effect your optimization has on real benchmarks.  Then we can tell if it is worth using in general.
>>
>> right, but in order to "recompile the system using my optimization" i
>> need to implement it first :)
>>
>> And it sounds like you believe that it can be easily added, if so ,
>> then please give me a clues, where i should start from.
>> Maybe i am looking at wrong place, or at same place (as a whole), but
>> wrong section.
>
> Well, it needs to be part of the closure analysis which starts in BytecodeAgnosticMethodNode>>generate: where it sends self ensureClosureAnalysisDone.  To understand the closure analysis read my blog post and the code.  The blog post is a little out of date now but things haven't changed too much.  There's a long comment in BlockNode>>analyseArguments:temporaries:rootNode:.  To avoid wrecking the compiler while you work on it consider cloning it, using e.g. Encoder>>globalSourceRanges (which is used in Environment>>rewriteSourceForSelector:inClass:using:) to rename the relevant globals.  You can see the use of fixTemp: and BytecodeAgnosticMethodNode>>primitiveErrorVariableName for how to allocate a temp (provided there are spares available).
> There are other options (the NewCompiler, OMeta etc) but the above amy be the most direct.
>

Thanks.

Just one sidenote.
Testing whether an optimization improves a run-time characteristics of
image by using a recompilation of everything is
unfair macro benchmark, because an optimization implementation could
introduce a slowdown into a compiler, simply because it has to do some
additional analysis during compilation.
So, you can have a slower compiler, which producing a faster code.
Or have a faster compiler, which producing slower code.
And therefore its pointless to use compilation as a macro-benchmark,
because in this way you measuring
a run-time performance of compiled code mixed with objective compiler
performance having a different implementation.

So, i think to see clearly, if there's any difference between new
optimization and without it, one should use a macro-benchmarks
which does not involves compilation.

Or, at least, do it carefully, to not confuse ourselves with product
of compiling and compiling itself:
1. - create a compiler subclass, implement it there
2. - run a compilation of the everything using new compiler, but _do
not install compiled methods_
3. - recompile an image using new compiler and _install the methods_
4. - run a compilation of the everything using new compiler, but _do
not install compiled methods_

in this way, comparing only between 2 and 4 will be fair, since it
will execute the same piece of code
but compiled using two different compilers.


--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Igor Stasenko
 
And here's a plan, what i wanna try to do, when analysing the block node:

- if block node is inlined,
- check if it having loops (whileTrue/whileFalse)
- check if inside a loop it having a closure
- for any of such closures, define a temp in a block node's scope
(which has to be the scope of outer block, since our block is inlined)
- replace a closure blockNode in AST tree with nodes equivalent to a
following: (temp ifNil: [ temp := <closure blockNode> ])

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: A BlockClosure optimization

Igor Stasenko
 
Heck..
i implemented it,
but can't bencth it for real :(

My image contains a mix of underscore assignments and
underscore selectors.. and recompiling everything automatically will fail.

I tried to it in Pharo image, since its having a consistent policy
with underscores,
but there's another thing: it missing an optimizedMessageNode in
BlockNode class,
which i used to test if block sits inside a loop :(
Strange, i thought that both Pharo and Squeak using same compiler.

I attached changeset, maybe you having an image where full recompile
can be run? :)

You can simply do:
Preferences enable: #optimizeClosuresInLoops.
And then recompile the system.

Or use:
ClosureLoopPushTest new bench "print it"

I found that in my image, there's only 269 closures (in various
methods), affected by this optimization.
This number can be increased by using more advanced heuristics, which
you suggested,
but i'm a bit busy to continue on it now.


On 16 May 2010 08:41, Igor Stasenko <[hidden email]> wrote:

> And here's a plan, what i wanna try to do, when analysing the block node:
>
> - if block node is inlined,
> - check if it having loops (whileTrue/whileFalse)
> - check if inside a loop it having a closure
> - for any of such closures, define a temp in a block node's scope
> (which has to be the scope of outer block, since our block is inlined)
> - replace a closure blockNode in AST tree with nodes equivalent to a
> following: (temp ifNil: [ temp := <closure blockNode> ])
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>


--
Best regards,
Igor Stasenko AKA sig.

closure-in-loop.2.cs (16K) Download Attachment