Question about inlining | How to access named temps in FullBlockClosure?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Question about inlining | How to access named temps in FullBlockClosure?

marcel.taeumel
Hi Eliot, hi all!

I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.

Here is a piece of code that works as expected:

| result context |
result := OrderedCollection new.

(1 to: 5) do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 3"

HOWEVER, using #to:do: the result changes because code gets inlined:

| result context |
result := OrderedCollection new.

1 to: 5 do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 6"

What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:

-> 'object + 1'
-> 'object + 2'
-> 'object + 3'
-> 'object + 4'
-> 'object + 5'

Is there a way to access the values of temps when the block closures are created this way?

Best,
Marcel


Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Eliot Miranda-2
Hi Marcel,

On Mar 26, 2020, at 5:15 AM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi all!

I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.

Here is a piece of code that works as expected:

| result context |
result := OrderedCollection new.

(1 to: 5) do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 3"

HOWEVER, using #to:do: the result changes because code gets inlined:

| result context |
result := OrderedCollection new.

1 to: 5 do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 6"

What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:

-> 'object + 1'
-> 'object + 2'
-> 'object + 3'
-> 'object + 4'
-> 'object + 5'

Is there a way to access the values of temps when the block closures are created this way?

Nice one.  So the thing to realize is that, because of inlining, the second example is in fact equivalent to

| result context each |
result := OrderedCollection new.

each := 1.
[ each <= 5 ] whileTrue:
    [ result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each') "= 6"

and when written like this it is clear that the search for the value will end up being as found, 6.

So then the thing to realize is that the inlining decision is wrong.  That in fact the closing over of each (the index variable of the inlined to:do:) within an uninlined inner block means that the outer block cannot be inlined without breaking the inner block.

So the next thing to determine is whether the compiler can easily be modified to identify this case and hence prevent inlining in case like these.

After that we have to assess and find out how many cases in the system (eg in trunk) would no longer be inlined and decide whether the cure is worse than the disease.  The issue here is two fold, the loss of performance due to not inlining, and the slow down to compilation due to the check(s) needed to prevent inlining in this case.

Why would adding the check potentially cause many to:[by:]do: invocations to no longer be inlined? The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.  Because this is a dynamic language we cannot assume anything about the message the block is an argument to; the programmer could redefine an implementation of a method with that selector to capture the block argument just as your example does.  So we have to assume the block could be captured, and hence that the to:[by:]do: in which it occurs cannot be inlined.

Other Smalltalks have decided to live with the disease.  I think that the right decision is *not* to live with the disease, but to fix the inliner, and wait for Scorch to “do the right thing” and inline dynamically.  Scorch can inline precisely because it does indeed find out about the caller, and inline if the caller does not capture the block, and adds the dependency information to enable the optimized version to be discarded if redefinition occurs.

The first thing to do is write a test asserting the correct behaviour.

The second thing to do is learn how and where the bytecode compiler decides whether or not to inline to:[by:]do:.

The third thing to do is to modify that inlining decision to generate correct code for your case.

The fourth thing to do is then to find out how many methods are affected.

Finally we can then assess whether we want to live with the disease or not.

Best,
Marcel

Best, Eliot
_,,,^..^,,,_ (phone)


Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Christoph Thiede

Hi Eliot,


thanks for the elaborate explanation! Wow, even more things are getting blocked while we are longing for Scorch. :D Could you tell us when you expect to start making your blessing available for Squeak, only very rough? Are we speaking of months? Years? Decades? ;-)


Here're my two cents on the story:


The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.


Hm, what will happen with this:


1 to: 5 do: [:each |

    [:object | object + each] in: [:block | result add: block]].


The left block is not an argument, but still it would be relevant to disable the inlining of the outer block. So your rule has to respect both receiver and arguments, am I right? This will increase the "collateral damage" of unnecessary slow down even further, we would also have to forgo inlining for things like {[:object | object + each] cull: 42} etc.


Or this one:


1 to: 5 do: [:each |

    [result add: thisContext sender top] value; yourself].


And finally:


1 to: 5 do: [:each |

    result add: [thisContext tempVarNamed: 'each']].


You can (almost) always crack the system, so - if we should decide to turn off inlining sometimes - however we do this, in each case, there will be a logical break at some position. In my humble opinion, this would cause more confusion than actually solve problems. There has never been given a guarantee that blocks will be not inlined. If you wish to prevent this, the simplest and clearest solution IMO is still


1 to: 5 do: [:each |
    result add: [:object | object + each]
] yourself. 


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Donnerstag, 26. März 2020 17:20:08
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
 
Hi Marcel,

On Mar 26, 2020, at 5:15 AM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi all!

I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.

Here is a piece of code that works as expected:

| result context |
result := OrderedCollection new.

(1 to: 5) do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 3"

HOWEVER, using #to:do: the result changes because code gets inlined:

| result context |
result := OrderedCollection new.

1 to: 5 do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 6"

What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:

-> 'object + 1'
-> 'object + 2'
-> 'object + 3'
-> 'object + 4'
-> 'object + 5'

Is there a way to access the values of temps when the block closures are created this way?

Nice one.  So the thing to realize is that, because of inlining, the second example is in fact equivalent to

| result context each |
result := OrderedCollection new.

each := 1.
[ each <= 5 ] whileTrue:
    [ result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each') "= 6"

and when written like this it is clear that the search for the value will end up being as found, 6.

So then the thing to realize is that the inlining decision is wrong.  That in fact the closing over of each (the index variable of the inlined to:do:) within an uninlined inner block means that the outer block cannot be inlined without breaking the inner block.

So the next thing to determine is whether the compiler can easily be modified to identify this case and hence prevent inlining in case like these.

After that we have to assess and find out how many cases in the system (eg in trunk) would no longer be inlined and decide whether the cure is worse than the disease.  The issue here is two fold, the loss of performance due to not inlining, and the slow down to compilation due to the check(s) needed to prevent inlining in this case.

Why would adding the check potentially cause many to:[by:]do: invocations to no longer be inlined? The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.  Because this is a dynamic language we cannot assume anything about the message the block is an argument to; the programmer could redefine an implementation of a method with that selector to capture the block argument just as your example does.  So we have to assume the block could be captured, and hence that the to:[by:]do: in which it occurs cannot be inlined.

Other Smalltalks have decided to live with the disease.  I think that the right decision is *not* to live with the disease, but to fix the inliner, and wait for Scorch to “do the right thing” and inline dynamically.  Scorch can inline precisely because it does indeed find out about the caller, and inline if the caller does not capture the block, and adds the dependency information to enable the optimized version to be discarded if redefinition occurs.

The first thing to do is write a test asserting the correct behaviour.

The second thing to do is learn how and where the bytecode compiler decides whether or not to inline to:[by:]do:.

The third thing to do is to modify that inlining decision to generate correct code for your case.

The fourth thing to do is then to find out how many methods are affected.

Finally we can then assess whether we want to live with the disease or not.

Best,
Marcel

Best, Eliot
_,,,^..^,,,_ (phone)


Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Eliot Miranda-2
Hi Christoph,

On Mar 26, 2020, at 10:42 AM, Thiede, Christoph <[hidden email]> wrote:



Hi Eliot,


thanks for the elaborate explanation! Wow, even more things are getting blocked while we are longing for Scorch. :D Could you tell us when you expect to start making your blessing available for Squeak, only very rough? Are we speaking of months? Years? Decades? ;-)


That will depend on if I can find a few skilled collaborators or not (too many and communication costs will overwhelm).  Certainly not decades.  Clément has things working.  So of the order of six to twelve person months of effort.

I encourage you to look at Clément’s presentations and see if it interests you.  You might be one of those people.  This is a great PhD topic.

Here're my two cents on the story:


The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.


Hm, what will happen with this:


1 to: 5 do: [:each |

    [:object | object + each] in: [:block | result add: block]].


The left block is not an argument, but still it would be relevant to disable the inlining of the outer block. So your rule has to respect both receiver and arguments, am I right? This will increase the "collateral damage" of unnecessary slow down even further, we would also have to forgo inlining for things like {[:object | object + each] cull: 42} etc.


Quite right.  Can you find out how common cases like these are?  I bet you there will be five methods or less in trunk that are affected.

Or this one:

1 to: 5 do: [:each |

    [result add: thisContext sender top] value; yourself].


And finally:


1 to: 5 do: [:each |

    result add: [thisContext tempVarNamed: 'each']].


I find these last two not compelling.  If one is dealing with thisContext one is close to the implementation so I would argue you get the unvarnished reality, not the ideal specification.

You can (almost) always crack the system, so - if we should decide to turn off inlining sometimes - however we do this, in each case, there will be a logical break at some position. In my humble opinion, this would cause more confusion than actually solve problems. There has never been given a guarantee that blocks will be not inlined.


Agreed.

If you wish to prevent this, the simplest and clearest solution IMO is still


1 to: 5 do: [:each |
    result add: [:object | object + each]
] yourself. 


Best,

Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Eliot Miranda <[hidden email]>
Gesendet: Donnerstag, 26. März 2020 17:20:08
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
 
Hi Marcel,

On Mar 26, 2020, at 5:15 AM, Marcel Taeumel <[hidden email]> wrote:

Hi Eliot, hi all!

I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.

Here is a piece of code that works as expected:

| result context |
result := OrderedCollection new.

(1 to: 5) do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 3"

HOWEVER, using #to:do: the result changes because code gets inlined:

| result context |
result := OrderedCollection new.

1 to: 5 do: [:each |
result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each'). " == 6"

What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:

-> 'object + 1'
-> 'object + 2'
-> 'object + 3'
-> 'object + 4'
-> 'object + 5'

Is there a way to access the values of temps when the block closures are created this way?

Nice one.  So the thing to realize is that, because of inlining, the second example is in fact equivalent to

| result context each |
result := OrderedCollection new.

each := 1.
[ each <= 5 ] whileTrue:
    [ result add: [:object | object + each]].

context := (result at: 3) outerContext.
context namedTempAt: (context tempNames indexOf: 'each') "= 6"

and when written like this it is clear that the search for the value will end up being as found, 6.

So then the thing to realize is that the inlining decision is wrong.  That in fact the closing over of each (the index variable of the inlined to:do:) within an uninlined inner block means that the outer block cannot be inlined without breaking the inner block.

So the next thing to determine is whether the compiler can easily be modified to identify this case and hence prevent inlining in case like these.

After that we have to assess and find out how many cases in the system (eg in trunk) would no longer be inlined and decide whether the cure is worse than the disease.  The issue here is two fold, the loss of performance due to not inlining, and the slow down to compilation due to the check(s) needed to prevent inlining in this case.

Why would adding the check potentially cause many to:[by:]do: invocations to no longer be inlined? The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.  Because this is a dynamic language we cannot assume anything about the message the block is an argument to; the programmer could redefine an implementation of a method with that selector to capture the block argument just as your example does.  So we have to assume the block could be captured, and hence that the to:[by:]do: in which it occurs cannot be inlined.

Other Smalltalks have decided to live with the disease.  I think that the right decision is *not* to live with the disease, but to fix the inliner, and wait for Scorch to “do the right thing” and inline dynamically.  Scorch can inline precisely because it does indeed find out about the caller, and inline if the caller does not capture the block, and adds the dependency information to enable the optimized version to be discarded if redefinition occurs.

The first thing to do is write a test asserting the correct behaviour.

The second thing to do is learn how and where the bytecode compiler decides whether or not to inline to:[by:]do:.

The third thing to do is to modify that inlining decision to generate correct code for your case.

The fourth thing to do is then to find out how many methods are affected.

Finally we can then assess whether we want to live with the disease or not.

Best,
Marcel

Best, Eliot
_,,,^..^,,,_ (phone)



Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Christoph Thiede
Hi Eliot,

> Quite right.  Can you find out how common cases like these are?  I bet you
> there will be five methods or less in trunk that are affected.

I did a full search in my image:

pattern := '.*to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[.*' asRegex.
self systemNavigation allMethods select: [:m |
        pattern matches: m getSource].

My Source Files are very slow,
<http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html>  
so I interrupted the script after one hour, but I still found more than 250
matches. I took some samples and they looked indeed relevant. Nested #to:do:
loops, for example ...
Or do I misunderstand your idea of a rule?

However, if we can agree on leaving it as it is, it's irrelevant ;-)

Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Nicolas Cellier
Hi Christoph,

Le ven. 27 mars 2020 à 17:10, Christoph Thiede <[hidden email]> a écrit :
Hi Eliot,

> Quite right.  Can you find out how common cases like these are?  I bet you
> there will be five methods or less in trunk that are affected.

I did a full search in my image:

pattern := '.*to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[.*' asRegex.
self systemNavigation allMethods select: [:m |
        pattern matches: m getSource].

My Source Files are very slow,

You might want to use:

    CurrentReadOnlySourceFiles cacheDuring: [ ... ]


<http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html
so I interrupted the script after one hour, but I still found more than 250
matches. I took some samples and they looked indeed relevant. Nested #to:do:
loops, for example ...
Or do I misunderstand your idea of a rule?

The problem is not so when evaluating nested block.
It's when returning a nested block for later evaluation.

However, if we can agree on leaving it as it is, it's irrelevant ;-)

Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html



Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Christoph Thiede

Hi Nicolas,


CurrentReadOnlySourceFiles cacheDuring: [ ... ]


thanks, but this still kept my image busy for more than ten minutes!


The problem is not so when evaluating nested block.

> It's when returning a nested block for later evaluation.

I know, but unless the nested block is unlined, you cannot tell whether any message that is sent to it, such as #cull:, will store it for later?
The compiler cannot know that #cull: does not look like this, for example:

cull: anObject
    Project current addDeferredUIMessage: [self value].
    ^ self value: anObject

Best,
Christoph


Von: Squeak-dev <[hidden email]> im Auftrag von Nicolas Cellier <[hidden email]>
Gesendet: Montag, 30. März 2020 12:54:20
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
 
Hi Christoph,

Le ven. 27 mars 2020 à 17:10, Christoph Thiede <[hidden email]> a écrit :
Hi Eliot,

> Quite right.  Can you find out how common cases like these are?  I bet you
> there will be five methods or less in trunk that are affected.

I did a full search in my image:

pattern := '.*to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[.*' asRegex.
self systemNavigation allMethods select: [:m |
        pattern matches: m getSource].

My Source Files are very slow,

You might want to use:

    CurrentReadOnlySourceFiles cacheDuring: [ ... ]


<http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html
so I interrupted the script after one hour, but I still found more than 250
matches. I took some samples and they looked indeed relevant. Nested #to:do:
loops, for example ...
Or do I misunderstand your idea of a rule?

The problem is not so when evaluating nested block.
It's when returning a nested block for later evaluation.

However, if we can agree on leaving it as it is, it's irrelevant ;-)

Best,
Christoph



--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html



Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Levente Uzonyi
In reply to this post by Christoph Thiede
Hi Christoph,

On Fri, 27 Mar 2020, Christoph Thiede wrote:

> Hi Eliot,
>
>> Quite right.  Can you find out how common cases like these are?  I bet you
>> there will be five methods or less in trunk that are affected.
>
> I did a full search in my image:
>
> pattern := '.*to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[.*' asRegex.
> self systemNavigation allMethods select: [:m |
> pattern matches: m getSource].
>
> My Source Files are very slow,
> <http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html>
> so I interrupted the script after one hour, but I still found more than 250
> matches. I took some samples and they looked indeed relevant. Nested #to:do:
> loops, for example ...

Try this instead:

| regex |
regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
CurrentReadOnlySourceFiles cacheDuring: [
  SystemNavigation default browseAllSelect: [ :method |
  regex search: method getSource asString ] ].

The key differences are:
- caching source files (as Nicolas suggested)
- converting the source to String from Text (regex is intended to work
with Strings not Texts)
- using #search: instead of #matches:

This brings down the runtime to 2.5 seconds on my machine.


Levente

> Or do I misunderstand your idea of a rule?
>
> However, if we can agree on leaving it as it is, it's irrelevant ;-)
>
> Best,
> Christoph
>
>
>
> --
> Sent from: http://forum.world.st/Squeak-Dev-f45488.html

Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

timrowledge


> On 2020-03-30, at 10:36 AM, Levente Uzonyi <[hidden email]> wrote:
>
> | regex |
> regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
> CurrentReadOnlySourceFiles cacheDuring: [
> SystemNavigation default browseAllSelect: [ :method |
> regex search: method getSource asString ] ].


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Porting is such sweet sorrow



Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

timrowledge
In reply to this post by Levente Uzonyi


> On 2020-03-30, at 10:36 AM, Levente Uzonyi <[hidden email]> wrote:
>
>
> Try this instead:
>
> | regex |
> regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
> CurrentReadOnlySourceFiles cacheDuring: [
> SystemNavigation default browseAllSelect: [ :method |
> regex search: method getSource asString ] ].
>
> The key differences are:
> - caching source files (as Nicolas suggested)
> - converting the source to String from Text (regex is intended to work with Strings not Texts)
> - using #search: instead of #matches:
>
> This brings down the runtime to 2.5 seconds on my machine.

And a smidge over 10 sec on my Pi 4 :-)

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
A computer program does what you tell it to do, not what you want it to do.



Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Christoph Thiede

> | regex |
> regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
> CurrentReadOnlySourceFiles cacheDuring: [
>          SystemNavigation default browseAllSelect: [ :method |
>                  regex search: method getSource asString ] ].

This still timed out my patience after 10 minutes :(

Von: Squeak-dev <[hidden email]> im Auftrag von tim Rowledge <[hidden email]>
Gesendet: Montag, 30. März 2020 19:42:31
An: The general-purpose Squeak developers list
Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
 


> On 2020-03-30, at 10:36 AM, Levente Uzonyi <[hidden email]> wrote:
>
>
> Try this instead:
>
> | regex |
> regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
> CurrentReadOnlySourceFiles cacheDuring: [
>        SystemNavigation default browseAllSelect: [ :method |
>                regex search: method getSource asString ] ].
>
> The key differences are:
> - caching source files (as Nicolas suggested)
> - converting the source to String from Text (regex is intended to work with Strings not Texts)
> - using #search: instead of #matches:
>
> This brings down the runtime to 2.5 seconds on my machine.

And a smidge over 10 sec on my Pi 4 :-)

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
A computer program does what you tell it to do, not what you want it to do.





Carpe Squeak!
Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Levente Uzonyi
You should fire up the VM with some profiler (I don't know what's
available on windows) to see why file access is so slow.


Levente

On Mon, 30 Mar 2020, Thiede, Christoph wrote:

>
> > | regex |
> > regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
> > CurrentReadOnlySourceFiles cacheDuring: [
> >          SystemNavigation default browseAllSelect: [ :method |
> >                  regex search: method getSource asString ] ].
>
> This still timed out my patience after 10 minutes :(
>
> _________________________________________________________________________________________________________________________________________________________________________________________________________________________________
> Von: Squeak-dev <[hidden email]> im Auftrag von tim Rowledge <[hidden email]>
> Gesendet: Montag, 30. März 2020 19:42:31
> An: The general-purpose Squeak developers list
> Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?  
>
>
> > On 2020-03-30, at 10:36 AM, Levente Uzonyi <[hidden email]> wrote:
> >
> >
> > Try this instead:
> >
> > | regex |
> > regex := 'to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[' asRegex.
> > CurrentReadOnlySourceFiles cacheDuring: [
> >        SystemNavigation default browseAllSelect: [ :method |
> >                regex search: method getSource asString ] ].
> >
> > The key differences are:
> > - caching source files (as Nicolas suggested)
> > - converting the source to String from Text (regex is intended to work with Strings not Texts)
> > - using #search: instead of #matches:
> >
> > This brings down the runtime to 2.5 seconds on my machine.
>
> And a smidge over 10 sec on my Pi 4 :-)
>
> tim
> --
> tim Rowledge; [hidden email]; http://www.rowledge.org/tim
> A computer program does what you tell it to do, not what you want it to do.
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Question about inlining | How to access named temps in FullBlockClosure?

Eliot Miranda-2
In reply to this post by Christoph Thiede
Hi Christoph,

   apologies for the late reply.  Busy weekend trying to get the Vm to run if compiled with MSVC.  Still not out of the woods...

On Fri, Mar 27, 2020 at 9:10 AM Christoph Thiede <[hidden email]> wrote:
Hi Eliot,

> Quite right.  Can you find out how common cases like these are?  I bet you
> there will be five methods or less in trunk that are affected.

I did a full search in my image:

pattern := '.*to\:[\s\r\n]*\w+[\s\r\n]*do\:[\s\r\n]*\[[^\]]*\[.*' asRegex.
self systemNavigation allMethods select: [:m |
        pattern matches: m getSource].

Well, that's not what I meant by a search.  However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:.  I think this is an awful implementation and would implement it very differently but that's the work-around we have in place now,

What I meant my a search was to look at MesasageNode>>transform: which is sent in MessageNode>>receiver:selector:arguments:precedence:from:.  This means it is ideal.  First of all it is bottom up; the to:[by:]do: message node is only created and transformed once all its sub components have been parsed.  So one is in a position in MessageNode>>transformToDo: to search the parse tree of the last block argument looking for accesses in non-optimized blocks (*) to its argument.  (*) remember its *any* reference in an unoptimized block, no mater how deeply nested.  So a reference to the block argument within an optimized block that is within an unoptimized block is till a reference.  e.g.

    1 to: 5 do:
        [:arg|
         arg even
            ifTrue: [evens add: [arg]]
            ifFalse: [odds add: [arg]]

are still references even though they're both in optimized blocks.  That's not a good example.  I should say something like

    1 to: 5 do:
        [:arg|
         them add: [arg even
                                ifTrue: [arg]
                                ifFalse: [arg+arg]]]

whereas here it doesn't matter:

    1 to: 5 do:
        [:arg|
         them add: (arg even
                                ifTrue: [arg]
                                ifFalse: [arg+arg])]

you get the idea.  So the search is tricky.  Basically as the tree is descended we need to either set a flag that says the search is now in an unoptimized block, or the search needs to look up towards the root block arg to to:[by:]do: to find an intervening unoptimized block.
 
My Source Files are very slow,
<http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html
so I interrupted the script after one hour, but I still found more than 250
matches. I took some samples and they looked indeed relevant. Nested #to:do:
loops, for example ...
Or do I misunderstand your idea of a rule?

Yes.  It has to be something in the parse tree hiersecy, actually in transformToDo:.  And its th cost of applying this rule during compilation that is one3 thing to be concerned about.

Now, in orde4r to accelerate a search for candidates remember that Nicolas wonderfully changed the compiler to add as literals the selectors of even optimized message sends.  So for example all uses of #to:[by:]do: have the selectors in their literal frame.  So to find out all the methods that you want to analyze further simply start from

((self systemNavigation allCallsOn: #to:do:), 
(self systemNavigation allCallsOn: #to:by:do:)) asSet asArray sorted

BTW in my VMMaker image 
    ((self systemNavigation allCallsOn: #to:do:), 
    (self systemNavigation allCallsOn: #to:by:do:)) asSet asArray sorted size 2628

So you could enumerate over these, grab the parse trees, and then search the parse trees to try and find how many methods are affected and to get a rush idea of how expensive the search is.

However, if we can agree on leaving it as it is, it's irrelevant ;-)

Best,
Christoph
--
Sent from: http://forum.world.st/Squeak-Dev-f45488.html

_,,,^..^,,,_
best, Eliot


Reply | Threaded
Open this post in threaded view
|

CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

Levente Uzonyi
Hi Eliot,

On Mon, 30 Mar 2020, Eliot Miranda wrote:

> Well, that's not what I meant by a search.  However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:.  I think this is an awful implementation and would implement it
> very differently but that's the work-around we have in place now,

How would you implement it?

<history>
When I introduced CurrentReadOnlySourceFiles, I wanted to solve the
issue of concurrent access to the source files.
I had the following options:
1) controlled access to a shared resource (a single read-only copy of the
SourceFiles array) with e.g. a Monitor
2) same as 1) but with multiple copies pooled
3) exceptions to define the scope and lifetime of the resources (the
read-only copies) within a process

I chose the third option because it was possible to introduce it without
exploring and rewriting existing users: you could leave all code as-is
and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code
that needed better performance.
It's obviously not a perfect solution, but I still think it was the best
available at the time.

Later ProcessLocalVariables were added to Trunk. Which could be used to
solve the concurrent access issue by using process-local copies of the
source files. The only challenge is to release them after they are not
needed any more. Perhaps a high priority process could do that after a
few minutes of inactivity. Or we could just let them linger and see if
they cause any problems.
</history>


Levente

Reply | Threaded
Open this post in threaded view
|

Re: CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

timrowledge


> On 2020-03-30, at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:
>
> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files.
> I had the following options:
> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
> 2) same as 1) but with multiple copies pooled
> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process

Without having really considered this in years, if I remember right almost all this problem is at root caused by the stupidity of us having a file access regime based on the ansi stream stuff - ie set the position, do a read (or write, whatever) and since they are separate operations one can get utterly screwed up by multiple processes having access. That part completely goes away with a more sensible approach where the position and the read/write are a single operation. As in readFromThisFileAcessorAtThisPositionThisManyBytes. I suspect one could find email from me about this from '96 at least.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
When all else fails, let a = 7.  If that doesn't help, then read the manual.



Reply | Threaded
Open this post in threaded view
|

Re: CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

David T. Lewis
On Mon, Mar 30, 2020 at 03:33:09PM -0700, tim Rowledge wrote:

>
>
> > On 2020-03-30, at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:
> >
> > When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files.
> > I had the following options:
> > 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
> > 2) same as 1) but with multiple copies pooled
> > 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
>
> Without having really considered this in years, if I remember right
> almost all this problem is at root caused by the stupidity of us having
> a file access regime based on the ansi stream stuff - ie set the position,
> do a read (or write, whatever) and since they are separate operations
> one can get utterly screwed up by multiple processes having access.
> That part completely goes away with a more sensible approach where
> the position and the read/write are a single operation. As in
> readFromThisFileAcessorAtThisPositionThisManyBytes. I suspect one could
> find email from me about this from '96 at least.
>

I think I do recall that ancient conversation :-)

But I am a bit skeptical. After all, we are caching the file position
in StandardFileStream>>position so if there are concurrency issues,
it's not entirely obvious where the fault may lie.

How to find out? Test using a Windows VM. Andreas wrote the file primitives
for Windows to operate at the direct file HANDLE level, rather than the
higher level stdio streams that we use for the other VMs. If the original
concurrency issues go away when running on Windows, then maybe it's true
that the problem is caused by reliance on stdio streams in our non-Windows
VMs. But if the problems also exist on Windows, then the problem is more
likely right here in our own image.

If you consider the general concept of a stream over an underlying
physical storage medium, it is quite natural to think of the possibility
of multiple streams over the same underlying storage medium. That is
what stdio streams do. I am not entirely confident that we are doing
a good job of mapping that concept into StandardFileStream in Squeak.

Dave

Reply | Threaded
Open this post in threaded view
|

Re: CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

Eliot Miranda-2
In reply to this post by Levente Uzonyi
Hi Levente,

> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:
>
> Hi Eliot,
>
>> On Mon, 30 Mar 2020, Eliot Miranda wrote:
>>
>> Well, that's not what I meant by a search.  However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:.  I think this is an awful implementation and would implement it
>> very differently but that's the work-around we have in place now,
>
> How would you implement it?
>
> <history>
> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files.
> I had the following options:
> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
> 2) same as 1) but with multiple copies pooled
> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
>
> I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is
> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance.
> It's obviously not a perfect solution, but I still think it was the best available at the time.
>
> Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
> </history>

I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access.  As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug.

So I would provide something like
   SourceFiles withSubstituteCopiesWhile: aBlock
which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source.

The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice.  So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too).

So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution.

> Levente

Eliot

Reply | Threaded
Open this post in threaded view
|

Re: CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

Levente Uzonyi
Hi Eliot,

On Mon, 30 Mar 2020, Eliot Miranda wrote:

> Hi Levente,
>
>> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Eliot,
>>
>>> On Mon, 30 Mar 2020, Eliot Miranda wrote:
>>>
>>> Well, that's not what I meant by a search.  However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:.  I think this is an awful implementation and would implement it
>>> very differently but that's the work-around we have in place now,
>>
>> How would you implement it?
>>
>> <history>
>> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files.
>> I had the following options:
>> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
>> 2) same as 1) but with multiple copies pooled
>> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
>>
>> I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is
>> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance.
>> It's obviously not a perfect solution, but I still think it was the best available at the time.
>>
>> Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
>> </history>
>
> I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access.  As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug.
I don't think that's the only issue. Have a look at the senders of
#readOnlyCopy. Many of them were added 10+ years ago, well before
CurrentReadOnlySourceFiles was introduced. Most of those could use
CurrentReadOnlySourceFiles too but are unrelated to the debugger.

>
> So I would provide something like
>   SourceFiles withSubstituteCopiesWhile: aBlock
> which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source.
>
> The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice.  So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too).
>
> So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution.

I agree with what you wrote about method changes, but reading the sources
concurrently is still a possibility, especially when multiple UI processes
can exist at the same time (e.g. that's what opening a debugger does,
right?).


Levente

>
>> Levente
>
> Eliot

Reply | Threaded
Open this post in threaded view
|

Re: CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

timrowledge


> On 2020-03-31, at 3:31 PM, Levente Uzonyi <[hidden email]> wrote:
>  Have a look at the senders of #readOnlyCopy. Many of them were added 10+ years ago,

IIRC that was added almost at the very beginning of Squeak. I had to do an entire fake-filing-system VM layer to allow RISC OS to cope with the very weird seeming idea of allowing multiple openings of the same file. It certainly wasn't a thing inherited from Smalltalk-80v2 - I would have had to do sometihng similar in '87 if it were.

>>
>
> I agree with what you wrote about method changes, but reading the sources concurrently is still a possibility, especially when multiple UI processes can exist at the same time (e.g. that's what opening a debugger does, right?).

We should have the sources file opened once for reading and the changes file opened once for read/write. We should handle everything else by providing redirected access from streams of whatever sort and a a sensible set of primitives.

A read should be atomic and specified with the starting point, the number of bytes to read and where to store them; it should return the number of bytes read or an error code. A write should be atomic and specify the start point, the number to write and the place to read from; it should likewise return the number written or an error code.

That way having a bunch of streams connecting will work safely; you read from one and it cannot be mangled by another action moving the file position in the midst of the read/write.


tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
You never finish a program, you just stop working on it.



Reply | Threaded
Open this post in threaded view
|

Re: CurrenReadOnlySourceFiles (was: Re: Question about inlining | How to access named temps in FullBlockClosure?)

Eliot Miranda-2
In reply to this post by Levente Uzonyi
Hi Levente,

On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi <[hidden email]> wrote:
Hi Eliot,

On Mon, 30 Mar 2020, Eliot Miranda wrote:

> Hi Levente,
>
>> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Eliot,
>>
>>> On Mon, 30 Mar 2020, Eliot Miranda wrote:
>>>
>>> Well, that's not what I meant by a search.  However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:.  I think this is an awful implementation and would implement it
>>> very differently but that's the work-around we have in place now,
>>
>> How would you implement it?
>>
>> <history>
>> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files.
>> I had the following options:
>> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
>> 2) same as 1) but with multiple copies pooled
>> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
>>
>> I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is
>> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance.
>> It's obviously not a perfect solution, but I still think it was the best available at the time.
>>
>> Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
>> </history>
>
> I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access.  As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug.

I don't think that's the only issue. Have a look at the senders of
#readOnlyCopy. Many of them were added 10+ years ago, well before
CurrentReadOnlySourceFiles was introduced. Most of those could use
CurrentReadOnlySourceFiles too but are unrelated to the debugger.

Yes, but IIRC that issue was to separate the writable file from the read-only file.  I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a read-only copy of the file for both sources and changes and do writes through the writable one.


>
> So I would provide something like
>   SourceFiles withSubstituteCopiesWhile: aBlock
> which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source.
>
> The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice.  So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too).
>
> So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution.

I agree with what you wrote about method changes, but reading the sources
concurrently is still a possibility, especially when multiple UI processes
can exist at the same time (e.g. that's what opening a debugger does,
right?).

My assertion is that the IDE is essentially single0-threaded and this doesn't;t have to be supported.  In any case, concurrent access will work if processes of the same priority level are cooperating.  But I just answered the debugger issue.  I'm assuming that the debugger guards all its source access by substituting a different file.  So it, and only it, accesses the sources files through copies, and it, and only it pays the cost for substituting the copies.  Normal queries can use a single read-only copy.  That gives us the functionality of cacheDuring: without having to invoke it.


So let me reiterate.

SourceFiles is modified to have a single writable version of the changes file and a single read-only version of sources nd changes files.  Source code is read through the readable copy and new source written through the writable copy.  Whenever the debugger accesses source it does so through a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies), and then ensures that the original files are restored.  There can be error checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not sure this is necessary; folks debugging source file access usually know what they're doing.

The result is that
- normal source reading does not require creating a read-only copy; it already exists.
- the debugger does not interfere with source access because it is careful to use copies and leave the originals undisturbed
- CurrentReadOnlySourceFiles and cacheDuring: can be discarded





Levente

>
>> Levente
>
> Eliot


--
_,,,^..^,,,_
best, Eliot


12