Cascading

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Cascading

stepha...@etas.com

Hello, does anyone know if cascading methods can produce more performant code? I always wondered why lint would suggest to cascade methods. I like to not cascade methods, because then it is often easier to edit the code or to debug selected code. But a college of mine recently said that cascading would increase performance. I do not know how to test that. Shouldnt the compiler be able to see if the code could be cascaded and then optimised if lint can see this?

Thank you in advance for any answer.

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/4939a9f9-a954-4ced-af3d-a708a1bc1c61n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cascading

Marcus Wagner
Short answer to "can produce more performant code":

yes, it can, following compiler theory, common expression elimination, applied to the left hand side of the cascade.

For example calling
a b c method1; method2
evaluates b and c once to determine the receiver of method1 and method2 where as
a b c method1.
a b c method2
obviously has to evaluate a b c twice with additional cost.

That leads to your argument that an optimization compiler should detect that and eliminate this redundancy for you.
But here you also have to ask if the optimization has the same effect.
Unfortunately, the answer here is no, not always.

I dare to say that such a proof is impossible in general, as this would imply a thorough investigation of the effects of calling a b c once or twice.

An example is e.g. seaside with its protocol to render:

  html heading level: 1; with: 'Hello world'.
  html heading level: 1; with: 'Hello world'.

has different effect than

  html
   heading level: 1; with: 'Hello world'. ;
   level: 1; with: 'Hello world'.

so in such cases (protocols with side effects in general) you have to write precisely what you mean.

So lint and any other similar tool will give you only a hint that cascading may gain performance, but it is up to you to decide if that would be also  correct to do so.

The same applies vice versa, to exand a cascade to series of individual cases.

How to test that performance gain: a simple mean is to bench repeated execution of such a code fragment (using Envy/Stats).


Another way is to investigate the emitted byte code, for insiders:

For example
    html heading
        level: 1;
        with: 'Hello world'.
    html heading
        level: 1;
        with: 'Hello world'

yields

0000 00                    temps      = 0
0001 01                    args       = 1
0002 00                    prim       = 0   VMprSendNoTempsNoBlock
0003 00                    stackFrame = 0

0004 7F                    BCpushTemp0 - (sp) push temp 0 (html)
0005 87                    BCsendUnary0 - (sp) send unary 0 (#heading), args = 0
0006 0F                    BCdupTOS -      dup TOS
0007 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0009 66 20                 BCsendDropArgs1 - (sp) send/drop literal 1 (#level:), args = 1
0011 25 24                 BCpushLiteralB - (b ) push literal 2 ('Hello world')
0013 66 28                 BCsendDropArgs1 - (sp) send/drop literal 3 (#with:), args = 1
0015 7F                    BCpushTemp0 - (sp) push temp 0 (html)
0016 87                    BCsendUnary0 - (sp) send unary 0 (#heading), args = 0
0017 0F                    BCdupTOS -      dup TOS
0018 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0020 66 20                 BCsendDropArgs1 - (sp) send/drop literal 1 (#level:), args = 1
0022 25 24                 BCpushLiteralB - (b ) push literal 2 ('Hello world')
0024 66 28                 BCsendDropArgs1 - (sp) send/drop literal 3 (#with:), args = 1
0026 44                    BCreturnTOS -      return TOS

where as

    html
        level: 1;
        with: 'Hello world';
        level: 1;
        with: 'Hello world'

will give a more compact code

0000 00                    temps      = 0
0001 01                    args       = 1
0002 00                    prim       = 0   VMprSendNoTempsNoBlock
0003 00                    stackFrame = 0

0004 7F                    BCpushTemp0 - (sp) push temp 0 (html)
0005 0F                    BCdupTOS -      dup TOS
0006 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0008 5C 1C 01              BCsendDropDupLiteralB - (b ) send/drop/dup literal 0 (#level:), args = 1
0011 25 20                 BCpushLiteralB - (b ) push literal 1 ('Hello world')
0013 5C 24 01              BCsendDropDupLiteralB - (b ) send/drop/dup literal 2 (#with:), args = 1
0016 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0018 66 1C                 BCsendDropArgs1 - (sp) send/drop literal 0 (#level:), args = 1
0020 25 20                 BCpushLiteralB - (b ) push literal 1 ('Hello world')
0022 66 24                 BCsendDropArgs1 - (sp) send/drop literal 2 (#with:), args = 1
0024 44                    BCreturnTOS -      return TOS

Two byte codes less, indicating a  potentially slight performance gain.
However, as I explained, in seaside the compact code will not have the same effect.

Summary: be careful. Converting of cascades is complex in respect of side effects.
Remember: protocols like
a ifNil: [:b ]
a and: [b]
vs.
a isNil ifTrue: [a]
a & b 
also deal with this topic "common expression elimination"
It is up to you to choose what you want, not being left to an optimizing compiler.

Kind regard
Marcus
[hidden email] schrieb am Mittwoch, 31. März 2021 um 10:58:01 UTC+2:

Hello, does anyone know if cascading methods can produce more performant code? I always wondered why lint would suggest to cascade methods. I like to not cascade methods, because then it is often easier to edit the code or to debug selected code. But a college of mine recently said that cascading would increase performance. I do not know how to test that. Shouldnt the compiler be able to see if the code could be cascaded and then optimised if lint can see this?

Thank you in advance for any answer.

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/7e40c848-10ed-42d8-823e-bcec2237f9f8n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cascading

amitin
Hello,

For cascaded sends VAST Smalltalk compiler keeps the receiver on TOS by using a special send bytecode which dups TOS (and drops the result) after the message sending is complete, so there is no additional bytecode to dispatch per message send.
So in interpreter mode 'cascade'-way should be slightly more performant, but if you cascade more than 2 methods.
In jit mode, there is no more bytecode dispatching, and a code which pushes the receiver to stack is inlined, but copying the receiver on stack is still a little bit more efficient
For example, the x64 code for BCpushTemp0:
movq    8(%rdx), %rax
movq    %rax, -8(%r15)
addq     $-8, %r15

While drop and duplicating TOS (which we use for cascaded) looks as
movq   8(%r15), %rax
movq    %rax, (%r15)


среда, 31 марта 2021 г. в 16:13:46 UTC+3, [hidden email]:
Short answer to "can produce more performant code":

yes, it can, following compiler theory, common expression elimination, applied to the left hand side of the cascade.

For example calling
a b c method1; method2
evaluates b and c once to determine the receiver of method1 and method2 where as
a b c method1.
a b c method2
obviously has to evaluate a b c twice with additional cost.

That leads to your argument that an optimization compiler should detect that and eliminate this redundancy for you.
But here you also have to ask if the optimization has the same effect.
Unfortunately, the answer here is no, not always.

I dare to say that such a proof is impossible in general, as this would imply a thorough investigation of the effects of calling a b c once or twice.

An example is e.g. seaside with its protocol to render:

  html heading level: 1; with: 'Hello world'.
  html heading level: 1; with: 'Hello world'.

has different effect than

  html
   heading level: 1; with: 'Hello world'. ;
   level: 1; with: 'Hello world'.

so in such cases (protocols with side effects in general) you have to write precisely what you mean.

So lint and any other similar tool will give you only a hint that cascading may gain performance, but it is up to you to decide if that would be also  correct to do so.

The same applies vice versa, to exand a cascade to series of individual cases.

How to test that performance gain: a simple mean is to bench repeated execution of such a code fragment (using Envy/Stats).


Another way is to investigate the emitted byte code, for insiders:

For example
    html heading
        level: 1;
        with: 'Hello world'.
    html heading
        level: 1;
        with: 'Hello world'

yields

0000 00                    temps      = 0
0001 01                    args       = 1
0002 00                    prim       = 0   VMprSendNoTempsNoBlock
0003 00                    stackFrame = 0

0004 7F                    BCpushTemp0 - (sp) push temp 0 (html)
0005 87                    BCsendUnary0 - (sp) send unary 0 (#heading), args = 0
0006 0F                    BCdupTOS -      dup TOS
0007 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0009 66 20                 BCsendDropArgs1 - (sp) send/drop literal 1 (#level:), args = 1
0011 25 24                 BCpushLiteralB - (b ) push literal 2 ('Hello world')
0013 66 28                 BCsendDropArgs1 - (sp) send/drop literal 3 (#with:), args = 1
0015 7F                    BCpushTemp0 - (sp) push temp 0 (html)
0016 87                    BCsendUnary0 - (sp) send unary 0 (#heading), args = 0
0017 0F                    BCdupTOS -      dup TOS
0018 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0020 66 20                 BCsendDropArgs1 - (sp) send/drop literal 1 (#level:), args = 1
0022 25 24                 BCpushLiteralB - (b ) push literal 2 ('Hello world')
0024 66 28                 BCsendDropArgs1 - (sp) send/drop literal 3 (#with:), args = 1
0026 44                    BCreturnTOS -      return TOS

where as

    html
        level: 1;
        with: 'Hello world';
        level: 1;
        with: 'Hello world'

will give a more compact code

0000 00                    temps      = 0
0001 01                    args       = 1
0002 00                    prim       = 0   VMprSendNoTempsNoBlock
0003 00                    stackFrame = 0

0004 7F                    BCpushTemp0 - (sp) push temp 0 (html)
0005 0F                    BCdupTOS -      dup TOS
0006 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0008 5C 1C 01              BCsendDropDupLiteralB - (b ) send/drop/dup literal 0 (#level:), args = 1
0011 25 20                 BCpushLiteralB - (b ) push literal 1 ('Hello world')
0013 5C 24 01              BCsendDropDupLiteralB - (b ) send/drop/dup literal 2 (#with:), args = 1
0016 41 03                 BCpushMagicB - (b ) push magic 3 (SmallInteger 1)
0018 66 1C                 BCsendDropArgs1 - (sp) send/drop literal 0 (#level:), args = 1
0020 25 20                 BCpushLiteralB - (b ) push literal 1 ('Hello world')
0022 66 24                 BCsendDropArgs1 - (sp) send/drop literal 2 (#with:), args = 1
0024 44                    BCreturnTOS -      return TOS

Two byte codes less, indicating a  potentially slight performance gain.
However, as I explained, in seaside the compact code will not have the same effect.

Summary: be careful. Converting of cascades is complex in respect of side effects.
Remember: protocols like
a ifNil: [:b ]
a and: [b]
vs.
a isNil ifTrue: [a]
a & b 
also deal with this topic "common expression elimination"
It is up to you to choose what you want, not being left to an optimizing compiler.

Kind regard
Marcus
stepha...@... schrieb am Mittwoch, 31. März 2021 um 10:58:01 UTC+2:

Hello, does anyone know if cascading methods can produce more performant code? I always wondered why lint would suggest to cascade methods. I like to not cascade methods, because then it is often easier to edit the code or to debug selected code. But a college of mine recently said that cascading would increase performance. I do not know how to test that. Shouldnt the compiler be able to see if the code could be cascaded and then optimised if lint can see this?

Thank you in advance for any answer.

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/19638d5d-a63f-446d-8d06-81f8c249e1c7n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cascading

Richard Sargent
Administrator
In reply to this post by stepha...@etas.com
On Wednesday, March 31, 2021 at 1:58:01 AM UTC-7 [hidden email] wrote:

Hello, does anyone know if cascading methods can produce more performant code? I always wondered why lint would suggest to cascade methods. I like to not cascade methods, because then it is often easier to edit the code or to debug selected code. But a college of mine recently said that cascading would increase performance. I do not know how to test that. Shouldnt the compiler be able to see if the code could be cascaded and then optimised if lint can see this?

You have seen the answers from others with the data, but there are two other factors to consider about this construct.

1) When the Alto was produced, it ran at a little over 5 MHz. Today's computers aren't quite 1000x faster, but close enough.  Even into the early 90s, 60 MHz was a serious computer. So, yes. Back then, every byte code counted. Every one that could be eliminated was significant. So, with today's computers, you probably could not develop a real world measure of savings from using cascades in your application.

2) But, this is still an important point: Smalltalk was designed to be comprehensible and to communicate intention. So, it remains true that the fewer tokens a reader has to examine, the better the reader will comprehend what is intended. In my opinion, the cascade is easier to see and comprehend, because there are no extraneous tokens to parse.
It's a *little* like a shopping list. You normally put the list on one sheet of paper rather than each item on individual pieces of paper. Both would work, but one would help you more than the other.


Thank you in advance for any answer.

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/4934492d-b0bb-4c77-976f-a717f2cc8b57n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cascading

Vizio Costar
I'm all for writing efficient, readable code. However, in the 80s, when I was working in embedded systems with a Motorola 68000, we had Pascal code that was too slow. So, several of us went through the (rather large) code base and rewrote sections with performance in mind such as exiting loops immediately when possible.

After loading the code into our test system, and running it using an emulator (a device that plugs into the cpu socket), we found that the performance improved by several milliseconds. IOW, we wasted our time.

Today, if I had to prioritize, it would be 1. Readability and 2. Efficiency, under normal conditions/requirements.

On Thursday, April 1, 2021 at 1:13:34 PM UTC-4 Richard Sargent wrote:
On Wednesday, March 31, 2021 at 1:58:01 AM UTC-7 [hidden email] wrote:

Hello, does anyone know if cascading methods can produce more performant code? I always wondered why lint would suggest to cascade methods. I like to not cascade methods, because then it is often easier to edit the code or to debug selected code. But a college of mine recently said that cascading would increase performance. I do not know how to test that. Shouldnt the compiler be able to see if the code could be cascaded and then optimised if lint can see this?

You have seen the answers from others with the data, but there are two other factors to consider about this construct.

1) When the Alto was produced, it ran at a little over 5 MHz. Today's computers aren't quite 1000x faster, but close enough.  Even into the early 90s, 60 MHz was a serious computer. So, yes. Back then, every byte code counted. Every one that could be eliminated was significant. So, with today's computers, you probably could not develop a real world measure of savings from using cascades in your application.

2) But, this is still an important point: Smalltalk was designed to be comprehensible and to communicate intention. So, it remains true that the fewer tokens a reader has to examine, the better the reader will comprehend what is intended. In my opinion, the cascade is easier to see and comprehend, because there are no extraneous tokens to parse.
It's a *little* like a shopping list. You normally put the list on one sheet of paper rather than each item on individual pieces of paper. Both would work, but one would help you more than the other.


Thank you in advance for any answer.

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/5af2e150-fcf7-456a-a198-01d624d520a4n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cascading

Richard Sargent
Administrator
On Thursday, April 1, 2021 at 11:19:46 PM UTC-7 [hidden email] wrote:
I'm all for writing efficient, readable code. However, in the 80s, when I was working in embedded systems with a Motorola 68000, we had Pascal code that was too slow. So, several of us went through the (rather large) code base and rewrote sections with performance in mind such as exiting loops immediately when possible.

After loading the code into our test system, and running it using an emulator (a device that plugs into the cpu socket), we found that the performance improved by several milliseconds. IOW, we wasted our time.

I have a similar story from the 90s. A colleague  spent his time micro-optimizing the IBM Smalltalk code while I worked on reducing the round trips to the server. He shaved a fractions of seconds, maybe even multiple seconds. I shaved better than 75% off the 2 minute time to download a dataset.


Today, if I had to prioritize, it would be 1. Readability and 2. Efficiency, under normal conditions/requirements.

100% agreement!



On Thursday, April 1, 2021 at 1:13:34 PM UTC-4 Richard Sargent wrote:
On Wednesday, March 31, 2021 at 1:58:01 AM UTC-7 [hidden email] wrote:

Hello, does anyone know if cascading methods can produce more performant code? I always wondered why lint would suggest to cascade methods. I like to not cascade methods, because then it is often easier to edit the code or to debug selected code. But a college of mine recently said that cascading would increase performance. I do not know how to test that. Shouldnt the compiler be able to see if the code could be cascaded and then optimised if lint can see this?

You have seen the answers from others with the data, but there are two other factors to consider about this construct.

1) When the Alto was produced, it ran at a little over 5 MHz. Today's computers aren't quite 1000x faster, but close enough.  Even into the early 90s, 60 MHz was a serious computer. So, yes. Back then, every byte code counted. Every one that could be eliminated was significant. So, with today's computers, you probably could not develop a real world measure of savings from using cascades in your application.

2) But, this is still an important point: Smalltalk was designed to be comprehensible and to communicate intention. So, it remains true that the fewer tokens a reader has to examine, the better the reader will comprehend what is intended. In my opinion, the cascade is easier to see and comprehend, because there are no extraneous tokens to parse.
It's a *little* like a shopping list. You normally put the list on one sheet of paper rather than each item on individual pieces of paper. Both would work, but one would help you more than the other.


Thank you in advance for any answer.

--
You received this message because you are subscribed to the Google Groups "VAST Community Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/va-smalltalk/e9a12aa8-c680-4c4a-b83b-1ec20e6ff132n%40googlegroups.com.