[vwnc] [enh] VW7.6 Speeding up the low space process

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[vwnc] [enh] VW7.6 Speeding up the low space process

Martin McClure
I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
       
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self resetLowSpaceSignal]]]
                repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Andres Valloud-6
Thank you!  AR 55983.

Andres.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Martin McClure
Sent: Wednesday, January 07, 2009 6:49 PM
To: VWNC
Subject: [vwnc] [enh] VW7.6 Speeding up the low space process

I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
       
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self
resetLowSpaceSignal]]]
                repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Alan Knight-2
The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop. In general, if something happens that can't readily be interrupted by the profiler (e.g. spending time in primitives) then the profiler will end up interrupting the first instruction that runs after that activity is completed. That doesn't even necessarily have to be in the same process that was doing the work. So the very first message send of  lowSpaceLoop is one of those places that is likely to show up as consuming a large amount of time, even though, if we consider the matter, it's hard to see how the send of a message whose runtime is normally around that of a constant would actually make a difference. If you're profiling, it's important to make sure not only that you see what the profiler shows, but that changing it makes a significant difference. The only situation where resolving bindings is slow is immediately after a modifySystem:, but then it won't be the method in ResolvedDeferredBinding. So, for example

So, if extract the first ResolvedDeferredBinding from that method, and in the inspector evaluate

Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214 , 213, 176, 203
vs.
Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133, 112, 121, 124

it doesn't seem like something that would provide a big improvement.

At 09:57 PM 1/7/2009, Valloud, Andres wrote:
Thank you!  AR 55983.

Andres.

-----Original Message-----
From: [hidden email] [[hidden email]] On
Behalf Of Martin McClure
Sent: Wednesday, January 07, 2009 6:49 PM
To: VWNC
Subject: [vwnc] [enh] VW7.6 Speeding up the low space process

I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
        
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self
resetLowSpaceSignal]]]
                 repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Henrik Sperre Johansen
Tested Martin's suggestion, no change in total runtime for a batch
operation in the app I'm working on.
Though, I did have a similar problem some time ago, the solution then
was to use a MemoryPolicy wity incrementalAllocationThreshold much
larger than 10000 bytes. (This is for a batch operation, for which I
don't really care about inc gc as long as the machine does not run out
of memory, I switch back to a normal policy afterwards)
 
There's been quite alot of other memory-allocation improvements in the
app since then, but resetting incrementalAllocationThreshold to 10000
still turns:
(1m50s runtime total)
7.4 Processes at priority 90
    7.4 BlockClosure>>on:do:
      7.4 [] in ObjectMemory class>>installScavengeNotification
        7.4 ObjectMemory class>>scavengeOccurred
          7.4 primitives
 into:
(2m6s runtime total)
16.8 Processes at priority 91
    16.8 BlockClosure>>on:do:
      16.8 [] in ObjectMemory class>>installLowSpaceProcess
        16.8 ObjectMemory class>>lowSpaceLoop
          13.6 Semaphore>>critical:
            13.6 BlockClosure>>ensure:
              13.6 [] in ObjectMemory class>>lowSpaceLoop
                13.6 BlockClosure>>ensure:
                  13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
                    13.6 AsapBatchMemoryPolicy>>lowSpaceAction
                      13.5 MemoryPolicy>>incrementalReclamation
                        13.5 MemoryPolicy>>incrementalGC
                          13.5 ObjectMemory
class>>primIncrementalGCIsInterruptable:objects:bytes:
          3.2 ResolvedDeferredBinding>>value
  2.8 Processes at priority 90
    2.8 BlockClosure>>on:do:
      2.8 [] in ObjectMemory class>>installScavengeNotification
        2.8 ObjectMemory class>>scavengeOccurred

As Martin noticed excessive lowspace time spendage, he might be
experiencing something similar.

Cheers,
Henry

Alan Knight wrote:

> The problem is that that's more revealing issues with the profiler
> than it is performance penalties in the low space loop. In general, if
> something happens that can't readily be interrupted by the profiler
> (e.g. spending time in primitives) then the profiler will end up
> interrupting the first instruction that runs after that activity is
> completed. That doesn't even necessarily have to be in the same
> process that was doing the work. So the very first message send of  
> lowSpaceLoop is one of those places that is likely to show up as
> consuming a large amount of time, even though, if we consider the
> matter, it's hard to see how the send of a message whose runtime is
> normally around that of a constant would actually make a difference.
> If you're profiling, it's important to make sure not only that you see
> what the profiler shows, but that changing it makes a significant
> difference. The only situation where resolving bindings is slow is
> immediately after a modifySystem:, but then it won't be the method in
> ResolvedDeferredBinding. So, for example
>
> So, if extract the first ResolvedDeferredBinding from that method, and
> in the inspector evaluate
>
> Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> , 213, 176, 203
> vs.
> Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> 112, 121, 124
>
> it doesn't seem like something that would provide a big improvement.
>
> At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>> Thank you!  AR 55983.
>>
>> Andres.
>>
>> -----Original Message-----
>> From: [hidden email] [ mailto:[hidden email]] On
>> Behalf Of Martin McClure
>> Sent: Wednesday, January 07, 2009 6:49 PM
>> To: VWNC
>> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> I was doing some profiling runs of something that was spending a lot of
>> time in garbage collection. I noticed that the LowSpaceProcess was
>> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>
>> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
>> version below, which moves the three shared variable accesses to outside
>> the loop.
>>
>> Perhaps you want to do something like this in VW7.7?
>>
>> Regards,
>>
>> -Martin
>>
>>
>> lowSpaceLoop
>>        
>>         | lss lsm cmp |
>>         lss := LowSpaceSemaphore.
>>         lsm := LowSpaceMutex.
>>         cmp := CurrentMemoryPolicy.
>>         [lss wait.
>>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>> resetLowSpaceSignal]]]
>>                  repeat
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
> ------------------------------------------------------------------------
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>  
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Paul Baumann
The scavenge flushes bindings that need to be resolved later. This is something that I'd observed and mentioned years ago when namespace support was first added to VW. Caching the value that the profiler needs is (as Alan says) just going to move the cost elsewhere.

Flushing bindings really messes up profiling because things that should take no time at all now incur costs with some randomness. There is a cost there though, and it is larger and more prevalent than is commonly recognized. The profiler is correct in showing a cost, but the cost is not correctly attributed to the code being profiled.

What is really needed is to avoid flushing the bindings. Consider whether it is really necessary to flush of strong binding references just because there was GC activity (and there were no code changes).

This issue affects some applications more than others due to growth and memory configuration. Memory can't be safely configured high enough to avoid these costs completely. Go much higher than the total memory allocated above "ObjectMemory sizesAtStartup: #(37.0 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like a cascade of error messages when scrolling in an application window. You won't see the flush costs with simple example code. The flush/rebind costs are easy to see in a typical GBS-based application because scavenges are more frequent (a side effect of slow finalization in weak arrays used extensively by GBS).

Paul Baumann
IntercontinentalExchange | ICE

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Henrik Johansen
Sent: Thursday, January 08, 2009 3:49 AM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Tested Martin's suggestion, no change in total runtime for a batch operation in the app I'm working on.
Though, I did have a similar problem some time ago, the solution then was to use a MemoryPolicy wity incrementalAllocationThreshold much larger than 10000 bytes. (This is for a batch operation, for which I don't really care about inc gc as long as the machine does not run out of memory, I switch back to a normal policy afterwards)

There's been quite alot of other memory-allocation improvements in the app since then, but resetting incrementalAllocationThreshold to 10000 still turns:
(1m50s runtime total)
7.4 Processes at priority 90
    7.4 BlockClosure>>on:do:
      7.4 [] in ObjectMemory class>>installScavengeNotification
        7.4 ObjectMemory class>>scavengeOccurred
          7.4 primitives
 into:
(2m6s runtime total)
16.8 Processes at priority 91
    16.8 BlockClosure>>on:do:
      16.8 [] in ObjectMemory class>>installLowSpaceProcess
        16.8 ObjectMemory class>>lowSpaceLoop
          13.6 Semaphore>>critical:
            13.6 BlockClosure>>ensure:
              13.6 [] in ObjectMemory class>>lowSpaceLoop
                13.6 BlockClosure>>ensure:
                  13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
                    13.6 AsapBatchMemoryPolicy>>lowSpaceAction
                      13.5 MemoryPolicy>>incrementalReclamation
                        13.5 MemoryPolicy>>incrementalGC
                          13.5 ObjectMemory
class>>primIncrementalGCIsInterruptable:objects:bytes:
          3.2 ResolvedDeferredBinding>>value
  2.8 Processes at priority 90
    2.8 BlockClosure>>on:do:
      2.8 [] in ObjectMemory class>>installScavengeNotification
        2.8 ObjectMemory class>>scavengeOccurred

As Martin noticed excessive lowspace time spendage, he might be experiencing something similar.

Cheers,
Henry

Alan Knight wrote:

> The problem is that that's more revealing issues with the profiler
> than it is performance penalties in the low space loop. In general, if
> something happens that can't readily be interrupted by the profiler
> (e.g. spending time in primitives) then the profiler will end up
> interrupting the first instruction that runs after that activity is
> completed. That doesn't even necessarily have to be in the same
> process that was doing the work. So the very first message send of
> lowSpaceLoop is one of those places that is likely to show up as
> consuming a large amount of time, even though, if we consider the
> matter, it's hard to see how the send of a message whose runtime is
> normally around that of a constant would actually make a difference.
> If you're profiling, it's important to make sure not only that you see
> what the profiler shows, but that changing it makes a significant
> difference. The only situation where resolving bindings is slow is
> immediately after a modifySystem:, but then it won't be the method in
> ResolvedDeferredBinding. So, for example
>
> So, if extract the first ResolvedDeferredBinding from that method, and
> in the inspector evaluate
>
> Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> , 213, 176, 203
> vs.
> Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> 112, 121, 124
>
> it doesn't seem like something that would provide a big improvement.
>
> At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>> Thank you!  AR 55983.
>>
>> Andres.
>>
>> -----Original Message-----
>> From: [hidden email] [ mailto:[hidden email]] On
>> Behalf Of Martin McClure
>> Sent: Wednesday, January 07, 2009 6:49 PM
>> To: VWNC
>> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> I was doing some profiling runs of something that was spending a lot
>> of time in garbage collection. I noticed that the LowSpaceProcess was
>> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>
>> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>> the version below, which moves the three shared variable accesses to
>> outside the loop.
>>
>> Perhaps you want to do something like this in VW7.7?
>>
>> Regards,
>>
>> -Martin
>>
>>
>> lowSpaceLoop
>>
>>         | lss lsm cmp |
>>         lss := LowSpaceSemaphore.
>>         lsm := LowSpaceMutex.
>>         cmp := CurrentMemoryPolicy.
>>         [lss wait.
>>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>> resetLowSpaceSignal]]]
>>                  repeat
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
> [hidden email] http://www.cincom.com/smalltalk
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Alan Knight-2
In reply to this post by Henrik Sperre Johansen
How does scavenging flush bindings? That certainly isn't something I would have expected. And if the bindings are getting flushed, then I would expect to see time spent in resolving them. If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved. If bindings were flushed, I'd expect to see the method show up in a different class.

Flushing the bindings (modifySystem: etc) are definitely expensive operations whose cost is not easy to observe, but I wouldn't have expected them to occur as a result of GC.

At 10:16 AM 1/8/2009, Paul Baumann wrote:
The scavenge flushes bindings that need to be resolved later. This is something that I'd observed and mentioned years ago when namespace support was first added to VW. Caching the value that the profiler needs is (as Alan says) just going to move the cost elsewhere.

Flushing bindings really messes up profiling because things that should take no time at all now incur costs with some randomness. There is a cost there though, and it is larger and more prevalent than is commonly recognized. The profiler is correct in showing a cost, but the cost is not correctly attributed to the code being profiled.

What is really needed is to avoid flushing the bindings. Consider whether it is really necessary to flush of strong binding references just because there was GC activity (and there were no code changes).

This issue affects some applications more than others due to growth and memory configuration. Memory can't be safely configured high enough to avoid these costs completely. Go much higher than the total memory allocated above "ObjectMemory sizesAtStartup: #(37.0 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like a cascade of error messages when scrolling in an application window. You won't see the flush costs with simple example code. The flush/rebind costs are easy to see in a typical GBS-based application because scavenges are more frequent (a side effect of slow finalization in weak arrays used extensively by GBS).

Paul Baumann
IntercontinentalExchange | ICE

-----Original Message-----
From: [hidden email] [[hidden email]] On Behalf Of Henrik Johansen
Sent: Thursday, January 08, 2009 3:49 AM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Tested Martin's suggestion, no change in total runtime for a batch operation in the app I'm working on.
Though, I did have a similar problem some time ago, the solution then was to use a MemoryPolicy wity incrementalAllocationThreshold much larger than 10000 bytes. (This is for a batch operation, for which I don't really care about inc gc as long as the machine does not run out of memory, I switch back to a normal policy afterwards)

There's been quite alot of other memory-allocation improvements in the app since then, but resetting incrementalAllocationThreshold to 10000 still turns:
(1m50s runtime total)
7.4 Processes at priority 90
    7.4 BlockClosure>>on:do:
      7.4 [] in ObjectMemory class>>installScavengeNotification
        7.4 ObjectMemory class>>scavengeOccurred
          7.4 primitives
 into:
(2m6s runtime total)
16.8 Processes at priority 91
    16.8 BlockClosure>>on:do:
      16.8 [] in ObjectMemory class>>installLowSpaceProcess
        16.8 ObjectMemory class>>lowSpaceLoop
          13.6 Semaphore>>critical:
            13.6 BlockClosure>>ensure:
              13.6 [] in ObjectMemory class>>lowSpaceLoop
                13.6 BlockClosure>>ensure:
                  13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
                    13.6 AsapBatchMemoryPolicy>>lowSpaceAction
                      13.5 MemoryPolicy>>incrementalReclamation
                        13.5 MemoryPolicy>>incrementalGC
                          13.5 ObjectMemory
class>>primIncrementalGCIsInterruptable:objects:bytes:
          3.2 ResolvedDeferredBinding>>value
  2.8 Processes at priority 90
    2.8 BlockClosure>>on:do:
      2.8 [] in ObjectMemory class>>installScavengeNotification
        2.8 ObjectMemory class>>scavengeOccurred

As Martin noticed excessive lowspace time spendage, he might be experiencing something similar.

Cheers,
Henry

Alan Knight wrote:
> The problem is that that's more revealing issues with the profiler
> than it is performance penalties in the low space loop. In general, if
> something happens that can't readily be interrupted by the profiler
> (e.g. spending time in primitives) then the profiler will end up
> interrupting the first instruction that runs after that activity is
> completed. That doesn't even necessarily have to be in the same
> process that was doing the work. So the very first message send of
> lowSpaceLoop is one of those places that is likely to show up as
> consuming a large amount of time, even though, if we consider the
> matter, it's hard to see how the send of a message whose runtime is
> normally around that of a constant would actually make a difference.
> If you're profiling, it's important to make sure not only that you see
> what the profiler shows, but that changing it makes a significant
> difference. The only situation where resolving bindings is slow is
> immediately after a modifySystem:, but then it won't be the method in
> ResolvedDeferredBinding. So, for example
>
> So, if extract the first ResolvedDeferredBinding from that method, and
> in the inspector evaluate
>
> Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> , 213, 176, 203
> vs.
> Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> 112, 121, 124
>
> it doesn't seem like something that would provide a big improvement.
>
> At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>> Thank you!  AR 55983.
>>
>> Andres.
>>
>> -----Original Message-----
>> From: [hidden email] [ [hidden email]] On
>> Behalf Of Martin McClure
>> Sent: Wednesday, January 07, 2009 6:49 PM
>> To: VWNC
>> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> I was doing some profiling runs of something that was spending a lot
>> of time in garbage collection. I noticed that the LowSpaceProcess was
>> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>
>> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>> the version below, which moves the three shared variable accesses to
>> outside the loop.
>>
>> Perhaps you want to do something like this in VW7.7?
>>
>> Regards,
>>
>> -Martin
>>
>>
>> lowSpaceLoop
>>
>>         | lss lsm cmp |
>>         lss := LowSpaceSemaphore.
>>         lsm := LowSpaceMutex.
>>         cmp := CurrentMemoryPolicy.
>>         [lss wait.
>>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>> resetLowSpaceSignal]]]
>>                  repeat
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
> [hidden email] http://www.cincom.com/smalltalk
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Terry Raymond
In reply to this post by Paul Baumann
Paul

How is the scavenge flushing bindings?

Terry
 
===========================================================
Terry Raymond
Crafted Smalltalk
80 Lazywood Ln.
Tiverton, RI  02878
(401) 624-4517      [hidden email]
<http://www.craftedsmalltalk.com>
===========================================================

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Paul Baumann
> Sent: Thursday, January 08, 2009 10:17 AM
> To: 'Henrik Johansen'; VWNC
> Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process
>
> The scavenge flushes bindings that need to be resolved later. This is something that I'd observed and
> mentioned years ago when namespace support was first added to VW. Caching the value that the profiler
> needs is (as Alan says) just going to move the cost elsewhere.
>
> Flushing bindings really messes up profiling because things that should take no time at all now incur
> costs with some randomness. There is a cost there though, and it is larger and more prevalent than is
> commonly recognized. The profiler is correct in showing a cost, but the cost is not correctly
> attributed to the code being profiled.
>
> What is really needed is to avoid flushing the bindings. Consider whether it is really necessary to
> flush of strong binding references just because there was GC activity (and there were no code
> changes).
>
> This issue affects some applications more than others due to growth and memory configuration. Memory
> can't be safely configured high enough to avoid these costs completely. Go much higher than the total
> memory allocated above "ObjectMemory sizesAtStartup: #(37.0 185.0 7.0 2.0 2.0 2.0)." and you start to
> get window problems--like a cascade of error messages when scrolling in an application window. You
> won't see the flush costs with simple example code. The flush/rebind costs are easy to see in a
> typical GBS-based application because scavenges are more frequent (a side effect of slow finalization
> in weak arrays used extensively by GBS).
>
> Paul Baumann
> IntercontinentalExchange | ICE
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Henrik Johansen
> Sent: Thursday, January 08, 2009 3:49 AM
> To: VWNC
> Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process
>
> Tested Martin's suggestion, no change in total runtime for a batch operation in the app I'm working
> on.
> Though, I did have a similar problem some time ago, the solution then was to use a MemoryPolicy wity
> incrementalAllocationThreshold much larger than 10000 bytes. (This is for a batch operation, for
> which I don't really care about inc gc as long as the machine does not run out of memory, I switch
> back to a normal policy afterwards)
>
> There's been quite alot of other memory-allocation improvements in the app since then, but resetting
> incrementalAllocationThreshold to 10000 still turns:
> (1m50s runtime total)
> 7.4 Processes at priority 90
>     7.4 BlockClosure>>on:do:
>       7.4 [] in ObjectMemory class>>installScavengeNotification
>         7.4 ObjectMemory class>>scavengeOccurred
>           7.4 primitives
>  into:
> (2m6s runtime total)
> 16.8 Processes at priority 91
>     16.8 BlockClosure>>on:do:
>       16.8 [] in ObjectMemory class>>installLowSpaceProcess
>         16.8 ObjectMemory class>>lowSpaceLoop
>           13.6 Semaphore>>critical:
>             13.6 BlockClosure>>ensure:
>               13.6 [] in ObjectMemory class>>lowSpaceLoop
>                 13.6 BlockClosure>>ensure:
>                   13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
>                     13.6 AsapBatchMemoryPolicy>>lowSpaceAction
>                       13.5 MemoryPolicy>>incrementalReclamation
>                         13.5 MemoryPolicy>>incrementalGC
>                           13.5 ObjectMemory
> class>>primIncrementalGCIsInterruptable:objects:bytes:
>           3.2 ResolvedDeferredBinding>>value
>   2.8 Processes at priority 90
>     2.8 BlockClosure>>on:do:
>       2.8 [] in ObjectMemory class>>installScavengeNotification
>         2.8 ObjectMemory class>>scavengeOccurred
>
> As Martin noticed excessive lowspace time spendage, he might be experiencing something similar.
>
> Cheers,
> Henry
>
> Alan Knight wrote:
> > The problem is that that's more revealing issues with the profiler
> > than it is performance penalties in the low space loop. In general, if
> > something happens that can't readily be interrupted by the profiler
> > (e.g. spending time in primitives) then the profiler will end up
> > interrupting the first instruction that runs after that activity is
> > completed. That doesn't even necessarily have to be in the same
> > process that was doing the work. So the very first message send of
> > lowSpaceLoop is one of those places that is likely to show up as
> > consuming a large amount of time, even though, if we consider the
> > matter, it's hard to see how the send of a message whose runtime is
> > normally around that of a constant would actually make a difference.
> > If you're profiling, it's important to make sure not only that you see
> > what the profiler shows, but that changing it makes a significant
> > difference. The only situation where resolving bindings is slow is
> > immediately after a modifySystem:, but then it won't be the method in
> > ResolvedDeferredBinding. So, for example
> >
> > So, if extract the first ResolvedDeferredBinding from that method, and
> > in the inspector evaluate
> >
> > Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> > , 213, 176, 203
> > vs.
> > Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> > 112, 121, 124
> >
> > it doesn't seem like something that would provide a big improvement.
> >
> > At 09:57 PM 1/7/2009, Valloud, Andres wrote:
> >> Thank you!  AR 55983.
> >>
> >> Andres.
> >>
> >> -----Original Message-----
> >> From: [hidden email] [ mailto:[hidden email]] On
> >> Behalf Of Martin McClure
> >> Sent: Wednesday, January 07, 2009 6:49 PM
> >> To: VWNC
> >> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
> >>
> >> I was doing some profiling runs of something that was spending a lot
> >> of time in garbage collection. I noticed that the LowSpaceProcess was
> >> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
> >>
> >> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
> >> the version below, which moves the three shared variable accesses to
> >> outside the loop.
> >>
> >> Perhaps you want to do something like this in VW7.7?
> >>
> >> Regards,
> >>
> >> -Martin
> >>
> >>
> >> lowSpaceLoop
> >>
> >>         | lss lsm cmp |
> >>         lss := LowSpaceSemaphore.
> >>         lsm := LowSpaceMutex.
> >>         cmp := CurrentMemoryPolicy.
> >>         [lss wait.
> >>         lsm critical: [[cmp lowSpaceAction] ensure: [self
> >> resetLowSpaceSignal]]]
> >>                  repeat
> >> _______________________________________________
> >> vwnc mailing list
> >> [hidden email]
> >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> >>
> >> _______________________________________________
> >> vwnc mailing list
> >> [hidden email]
> >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> >
> > --
> > Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
> > [hidden email] http://www.cincom.com/smalltalk
> > ----------------------------------------------------------------------
> > --
> >
> > _______________________________________________
> > vwnc mailing list
> > [hidden email]
> > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
> >
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
>
> This message may contain confidential information and is intended for specific recipients unless
> explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this
> message, please delete it and notify the sender. This message may not represent the opinion of
> IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a
> contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is
> expected to provide safeguards from viruses and pursue alternate means of communication where privacy
> or a binding message is desired.
>
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Andres Valloud-6
In reply to this post by Andres Valloud-6
Hello,
 
I had been working on similar things recently, and I got curious about this.  The method lowSpaceLoop is on the class side of ObjectMemory, so I decided to run the following expressions in a class browser where "self" is bound to be ObjectMemory class.
 
Time millisecondsToRun:
    [
        1000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]
    ] 311

Time millisecondsToRun:
    [
        | lsm cmp |
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        1000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
    ] 343
 
 
So at first sight there's reason to think that loading the shared variables into a temporary variable inside a closure makes the code run slower by about 10% than when it simply references the shared variable.  I also thought it would be a good idea to make methods and run the test in this manner.
 
 
lowSpaceLoopTestA
 
 ^Time millisecondsToRun:
     [3000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]]
 
 
lowSpaceLoopTestB
 
 ^Time millisecondsToRun:
  [
   | lsm cmp |
   lsm := LowSpaceMutex.
   cmp := CurrentMemoryPolicy.
   3000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
  ]
 
 
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestA] #(781 729 732 732 727 732 727 728 728 734)
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestB] #(890 829 828 827 827 823 832 827 826 827)
 
 
So indeed, the B method is slower.
 
Andres.


From: Alan Knight [mailto:[hidden email]]
Sent: Wednesday, January 07, 2009 7:31 PM
To: Valloud, Andres; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop. In general, if something happens that can't readily be interrupted by the profiler (e.g. spending time in primitives) then the profiler will end up interrupting the first instruction that runs after that activity is completed. That doesn't even necessarily have to be in the same process that was doing the work. So the very first message send of  lowSpaceLoop is one of those places that is likely to show up as consuming a large amount of time, even though, if we consider the matter, it's hard to see how the send of a message whose runtime is normally around that of a constant would actually make a difference. If you're profiling, it's important to make sure not only that you see what the profiler shows, but that changing it makes a significant difference. The only situation where resolving bindings is slow is immediately after a modifySystem:, but then it won't be the method in ResolvedDeferredBinding. So, for example

So, if extract the first ResolvedDeferredBinding from that method, and in the inspector evaluate

Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214 , 213, 176, 203
vs.
Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133, 112, 121, 124

it doesn't seem like something that would provide a big improvement.

At 09:57 PM 1/7/2009, Valloud, Andres wrote:
Thank you!  AR 55983.

Andres.

-----Original Message-----
From: [hidden email] [[hidden email]] On
Behalf Of Martin McClure
Sent: Wednesday, January 07, 2009 6:49 PM
To: VWNC
Subject: [vwnc] [enh] VW7.6 Speeding up the low space process

I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
        
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self
resetLowSpaceSignal]]]
                 repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Paul Baumann
In reply to this post by Paul Baumann
"If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved."
 
Consider the class changes that happen in ResolvedDeferredBinding>>flushBinding and DeferredBinding>>resolveBinding. The profiler shows "ResolvedDeferredBinding>>value" but it may have been a DeferredBinding before the profiler reported the class.
 
My recollection of the details is off (that there is a flush from the GC activity itself), but there are binding costs that correlate with amount of GC activity. It could be a side effect of GC activity. Sorry, it has been many years since I looked into this. The flush was not obvious and took a lot of careful sampling before it was found. Breakpoints and such didn't help because of what was executing the code and handlers.
 
Paul Baumann 
IntercontinentalExchange | ICE
 
 

From: Alan Knight [mailto:[hidden email]]
Sent: Thursday, January 08, 2009 10:23 AM
To: Paul Baumann; 'Henrik Johansen'; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

How does scavenging flush bindings? That certainly isn't something I would have expected. And if the bindings are getting flushed, then I would expect to see time spent in resolving them. If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved. If bindings were flushed, I'd expect to see the method show up in a different class.

Flushing the bindings (modifySystem: etc) are definitely expensive operations whose cost is not easy to observe, but I wouldn't have expected them to occur as a result of GC.

At 10:16 AM 1/8/2009, Paul Baumann wrote:
The scavenge flushes bindings that need to be resolved later. This is something that I'd observed and mentioned years ago when namespace support was first added to VW. Caching the value that the profiler needs is (as Alan says) just going to move the cost elsewhere.

Flushing bindings really messes up profiling because things that should take no time at all now incur costs with some randomness. There is a cost there though, and it is larger and more prevalent than is commonly recognized. The profiler is correct in showing a cost, but the cost is not correctly attributed to the code being profiled.

What is really needed is to avoid flushing the bindings. Consider whether it is really necessary to flush of strong binding references just because there was GC activity (and there were no code changes).

This issue affects some applications more than others due to growth and memory configuration. Memory can't be safely configured high enough to avoid these costs completely. Go much higher than the total memory allocated above "ObjectMemory sizesAtStartup: #(37.0 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like a cascade of error messages when scrolling in an application window. You won't see the flush costs with simple example code. The flush/rebind costs are easy to see in a typical GBS-based application because scavenges are more frequent (a side effect of slow finalization in weak arrays used extensively by GBS).

Paul Baumann
IntercontinentalExchange | ICE

-----Original Message-----
From: [hidden email] [[hidden email]] On Behalf Of Henrik Johansen
Sent: Thursday, January 08, 2009 3:49 AM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Tested Martin's suggestion, no change in total runtime for a batch operation in the app I'm working on.
Though, I did have a similar problem some time ago, the solution then was to use a MemoryPolicy wity incrementalAllocationThreshold much larger than 10000 bytes. (This is for a batch operation, for which I don't really care about inc gc as long as the machine does not run out of memory, I switch back to a normal policy afterwards)

There's been quite alot of other memory-allocation improvements in the app since then, but resetting incrementalAllocationThreshold to 10000 still turns:
(1m50s runtime total)
7.4 Processes at priority 90
    7.4 BlockClosure>>on:do:
      7.4 [] in ObjectMemory class>>installScavengeNotification
        7.4 ObjectMemory class>>scavengeOccurred
          7.4 primitives
 into:
(2m6s runtime total)
16.8 Processes at priority 91
    16.8 BlockClosure>>on:do:
      16.8 [] in ObjectMemory class>>installLowSpaceProcess
        16.8 ObjectMemory class>>lowSpaceLoop
          13.6 Semaphore>>critical:
            13.6 BlockClosure>>ensure:
              13.6 [] in ObjectMemory class>>lowSpaceLoop
                13.6 BlockClosure>>ensure:
                  13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
                    13.6 AsapBatchMemoryPolicy>>lowSpaceAction
                      13.5 MemoryPolicy>>incrementalReclamation
                        13.5 MemoryPolicy>>incrementalGC
                          13.5 ObjectMemory
class>>primIncrementalGCIsInterruptable:objects:bytes:
          3.2 ResolvedDeferredBinding>>value
  2.8 Processes at priority 90
    2.8 BlockClosure>>on:do:
      2.8 [] in ObjectMemory class>>installScavengeNotification
        2.8 ObjectMemory class>>scavengeOccurred

As Martin noticed excessive lowspace time spendage, he might be experiencing something similar.

Cheers,
Henry

Alan Knight wrote:
> The problem is that that's more revealing issues with the profiler
> than it is performance penalties in the low space loop. In general, if
> something happens that can't readily be interrupted by the profiler
> (e.g. spending time in primitives) then the profiler will end up
> interrupting the first instruction that runs after that activity is
> completed. That doesn't even necessarily have to be in the same
> process that was doing the work. So the very first message send of
> lowSpaceLoop is one of those places that is likely to show up as
> consuming a large amount of time, even though, if we consider the
> matter, it's hard to see how the send of a message whose runtime is
> normally around that of a constant would actually make a difference.
> If you're profiling, it's important to make sure not only that you see
> what the profiler shows, but that changing it makes a significant
> difference. The only situation where resolving bindings is slow is
> immediately after a modifySystem:, but then it won't be the method in
> ResolvedDeferredBinding. So, for example
>
> So, if extract the first ResolvedDeferredBinding from that method, and
> in the inspector evaluate
>
> Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> , 213, 176, 203
> vs.
> Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> 112, 121, 124
>
> it doesn't seem like something that would provide a big improvement.
>
> At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>> Thank you!  AR 55983.
>>
>> Andres.
>>
>> -----Original Message-----
>> From: [hidden email] [ [hidden email]] On
>> Behalf Of Martin McClure
>> Sent: Wednesday, January 07, 2009 6:49 PM
>> To: VWNC
>> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> I was doing some profiling runs of something that was spending a lot
>> of time in garbage collection. I noticed that the LowSpaceProcess was
>> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>
>> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>> the version below, which moves the three shared variable accesses to
>> outside the loop.
>>
>> Perhaps you want to do something like this in VW7.7?
>>
>> Regards,
>>
>> -Martin
>>
>>
>> lowSpaceLoop
>>
>>         | lss lsm cmp |
>>         lss := LowSpaceSemaphore.
>>         lsm := LowSpaceMutex.
>>         cmp := CurrentMemoryPolicy.
>>         [lss wait.
>>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>> resetLowSpaceSignal]]]
>>                  repeat
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
> [hidden email] http://www.cincom.com/smalltalk
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Paul Baumann
In reply to this post by Andres Valloud-6
That test doesn't include any GC activity that would flush the bindings; therefore, the test had none of the costs that Martin was avoiding for that method.
 
Your test only measured closure scope differences--insignificant for this context. Note how the cost changes slightly if you add instance variables to ObjectMemory.
 
lowSpaceLoopTestC
 
 ^Time millisecondsToRun:
  [
   tmpA := LowSpaceMutex.
   tmpB := CurrentMemoryPolicy.
   3000000 timesRepeat: [tmpA critical: [[tmpB yourself] ensure: [self yourself]]]
  ]
 
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestA].
 #(1331 1335 1335 1343 1337 1334 1349 1339 1336 1336) average 1337.5
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestB].
 #(1862 1859 1873 1856 1867 1867 1865 1862 1868 1863) average 1864.2
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestC].
 #(1581 1563 1574 1568 1568 1568 1570 1570 1583 1570) average 1571.5
Looks like I need to request a faster computer. Your numbers were much better than this old 3GHz Pentium D. :)
 
The very first time I ran the first test was just after the schema change. Note the first number is higher than the rest.
 
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestA].
 #(1630 1347 1363 1350 1346 1352 1352 1352 1357 1349) average 1379.8
 
That first number shows a single rebinding cost (not from a GC related flush though). That performance hit gets incurred when there is GC activity that flushes bindings. The hit is random in application use. It was justified in this case due to the schema change for the test, but seems unnecessary for GC activity. Again, the flush correlates with GC activity and happens in ways that are not obvious.
 
Paul Baumann 
IntercontinentalExchange | ICE
 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Valloud, Andres
Sent: Thursday, January 08, 2009 12:17 PM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Hello,
 
I had been working on similar things recently, and I got curious about this.  The method lowSpaceLoop is on the class side of ObjectMemory, so I decided to run the following expressions in a class browser where "self" is bound to be ObjectMemory class.
 
Time millisecondsToRun:
    [
        1000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]
    ] 311

Time millisecondsToRun:
    [
        | lsm cmp |
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        1000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
    ] 343
 
 
So at first sight there's reason to think that loading the shared variables into a temporary variable inside a closure makes the code run slower by about 10% than when it simply references the shared variable.  I also thought it would be a good idea to make methods and run the test in this manner.
 
 
lowSpaceLoopTestA
 
 ^Time millisecondsToRun:
     [3000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]]
 
 
lowSpaceLoopTestB
 
 ^Time millisecondsToRun:
  [
   | lsm cmp |
   lsm := LowSpaceMutex.
   cmp := CurrentMemoryPolicy.
   3000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
  ]
 
 
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestA] #(781 729 732 732 727 732 727 728 728 734)
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestB] #(890 829 828 827 827 823 832 827 826 827)
 
 
So indeed, the B method is slower.
 
Andres.


From: Alan Knight [mailto:[hidden email]]
Sent: Wednesday, January 07, 2009 7:31 PM
To: Valloud, Andres; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop. In general, if something happens that can't readily be interrupted by the profiler (e.g. spending time in primitives) then the profiler will end up interrupting the first instruction that runs after that activity is completed. That doesn't even necessarily have to be in the same process that was doing the work. So the very first message send of  lowSpaceLoop is one of those places that is likely to show up as consuming a large amount of time, even though, if we consider the matter, it's hard to see how the send of a message whose runtime is normally around that of a constant would actually make a difference. If you're profiling, it's important to make sure not only that you see what the profiler shows, but that changing it makes a significant difference. The only situation where resolving bindings is slow is immediately after a modifySystem:, but then it won't be the method in ResolvedDeferredBinding. So, for example

So, if extract the first ResolvedDeferredBinding from that method, and in the inspector evaluate

Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214 , 213, 176, 203
vs.
Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133, 112, 121, 124

it doesn't seem like something that would provide a big improvement.

At 09:57 PM 1/7/2009, Valloud, Andres wrote:
Thank you!  AR 55983.

Andres.

-----Original Message-----
From: [hidden email] [[hidden email]] On
Behalf Of Martin McClure
Sent: Wednesday, January 07, 2009 6:49 PM
To: VWNC
Subject: [vwnc] [enh] VW7.6 Speeding up the low space process

I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
        
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self
resetLowSpaceSignal]]]
                 repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Andres Valloud-6
> That test doesn't include any GC activity that would flush the bindings; therefore, the test had none of the costs that Martin was avoiding for that method.
 
Ok, however we have not yet determined how that happens.  Also, if it does happen, I'd tend to think that would be an issue of its own rather than grounds for changing scoping rules for performance gain.  Since the scoping rules do not seem to add much benefit on their own, what can we do to find (or rule out) a flushed binding problem?  Do you remember any of your experiments, perhaps some code path that you saw a lot?
 
Andres.


From: Paul Baumann [mailto:[hidden email]]
Sent: Thursday, January 08, 2009 10:48 AM
To: Valloud, Andres; VWNC
Subject: RE: [vwnc] [enh] VW7.6 Speeding up the low space process

That test doesn't include any GC activity that would flush the bindings; therefore, the test had none of the costs that Martin was avoiding for that method.
 
Your test only measured closure scope differences--insignificant for this context. Note how the cost changes slightly if you add instance variables to ObjectMemory.
 
lowSpaceLoopTestC
 
 ^Time millisecondsToRun:
  [
   tmpA := LowSpaceMutex.
   tmpB := CurrentMemoryPolicy.
   3000000 timesRepeat: [tmpA critical: [[tmpB yourself] ensure: [self yourself]]]
  ]
 
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestA].
 #(1331 1335 1335 1343 1337 1334 1349 1339 1336 1336) average 1337.5
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestB].
 #(1862 1859 1873 1856 1867 1867 1865 1862 1868 1863) average 1864.2
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestC].
 #(1581 1563 1574 1568 1568 1568 1570 1570 1583 1570) average 1571.5
Looks like I need to request a faster computer. Your numbers were much better than this old 3GHz Pentium D. :)
 
The very first time I ran the first test was just after the schema change. Note the first number is higher than the rest.
 
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestA].
 #(1630 1347 1363 1350 1346 1352 1352 1352 1357 1349) average 1379.8
 
That first number shows a single rebinding cost (not from a GC related flush though). That performance hit gets incurred when there is GC activity that flushes bindings. The hit is random in application use. It was justified in this case due to the schema change for the test, but seems unnecessary for GC activity. Again, the flush correlates with GC activity and happens in ways that are not obvious.
 
Paul Baumann 
IntercontinentalExchange | ICE
 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Valloud, Andres
Sent: Thursday, January 08, 2009 12:17 PM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Hello,
 
I had been working on similar things recently, and I got curious about this.  The method lowSpaceLoop is on the class side of ObjectMemory, so I decided to run the following expressions in a class browser where "self" is bound to be ObjectMemory class.
 
Time millisecondsToRun:
    [
        1000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]
    ] 311

Time millisecondsToRun:
    [
        | lsm cmp |
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        1000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
    ] 343
 
 
So at first sight there's reason to think that loading the shared variables into a temporary variable inside a closure makes the code run slower by about 10% than when it simply references the shared variable.  I also thought it would be a good idea to make methods and run the test in this manner.
 
 
lowSpaceLoopTestA
 
 ^Time millisecondsToRun:
     [3000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]]
 
 
lowSpaceLoopTestB
 
 ^Time millisecondsToRun:
  [
   | lsm cmp |
   lsm := LowSpaceMutex.
   cmp := CurrentMemoryPolicy.
   3000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
  ]
 
 
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestA] #(781 729 732 732 727 732 727 728 728 734)
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestB] #(890 829 828 827 827 823 832 827 826 827)
 
 
So indeed, the B method is slower.
 
Andres.


From: Alan Knight [mailto:[hidden email]]
Sent: Wednesday, January 07, 2009 7:31 PM
To: Valloud, Andres; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop. In general, if something happens that can't readily be interrupted by the profiler (e.g. spending time in primitives) then the profiler will end up interrupting the first instruction that runs after that activity is completed. That doesn't even necessarily have to be in the same process that was doing the work. So the very first message send of  lowSpaceLoop is one of those places that is likely to show up as consuming a large amount of time, even though, if we consider the matter, it's hard to see how the send of a message whose runtime is normally around that of a constant would actually make a difference. If you're profiling, it's important to make sure not only that you see what the profiler shows, but that changing it makes a significant difference. The only situation where resolving bindings is slow is immediately after a modifySystem:, but then it won't be the method in ResolvedDeferredBinding. So, for example

So, if extract the first ResolvedDeferredBinding from that method, and in the inspector evaluate

Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214 , 213, 176, 203
vs.
Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133, 112, 121, 124

it doesn't seem like something that would provide a big improvement.

At 09:57 PM 1/7/2009, Valloud, Andres wrote:
Thank you!  AR 55983.

Andres.

-----Original Message-----
From: [hidden email] [[hidden email]] On
Behalf Of Martin McClure
Sent: Wednesday, January 07, 2009 6:49 PM
To: VWNC
Subject: [vwnc] [enh] VW7.6 Speeding up the low space process

I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
        
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self
resetLowSpaceSignal]]]
                 repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Alan Knight-2
In reply to this post by Paul Baumann
At 12:42 PM 1/8/2009, Paul Baumann wrote:
" If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved."
 
Consider the class changes that happen in ResolvedDeferredBinding>>flushBinding and DeferredBinding>>resolveBinding. The profiler shows "ResolvedDeferredBinding>>value" but it may have been a DeferredBinding before the profiler reported the class.

That's a reasonable conjecture, but it's not what I see if I run

TimeProfiler profile: [
1000000 timesRepeat: [
a := DeferredBinding forKey: #MenuBackground.
a method: (SymbolicPaint compiledMethodAt: #=).
a value]].

There the time is attributed to DeferredBinding, not to ResolvedDeferredBinding, and it also shows me a great deal of detail about what's going on inside the resolution.

  My recollection of the details is off (that there is a flush from the GC activity itself), but there are binding costs that correlate with amount of GC activity. It could be a side effect of GC activity. Sorry, it has been many years since I looked into this. The flush was not obvious and took a lot of careful sampling before it was found. Breakpoints and such didn't help because of what was executing the code and handlers.

It would be useful to have more information about this. As it stands, I'm certainly unaware of any correlation of GC activity and binding flushing or lookup. I don't see any senders in the image that appear likely to be invoked due to GC activity. The closest thing I can see is that re-running initializers would cause bindings to be flushed. But I wouldn't expect that to be done at run-time any more than code modification.


Paul Baumann
IntercontinentalExchange | ICE
 
 

From: Alan Knight [[hidden email]]
Sent: Thursday, January 08, 2009 10:23 AM
To: Paul Baumann; 'Henrik Johansen'; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

How does scavenging flush bindings? That certainly isn't something I would have expected. And if the bindings are getting flushed, then I would expect to see time spent in resolving them. If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved. If bindings were flushed, I'd expect to see the method show up in a different class.

Flushing the bindings (modifySystem: etc) are definitely expensive operations whose cost is not easy to observe, but I wouldn't have expected them to occur as a result of GC.

At 10:16 AM 1/8/2009, Paul Baumann wrote:
The scavenge flushes bindings that need to be resolved later. This is something that I'd observed and mentioned years ago when namespace support was first added to VW. Caching the value that the profiler needs is (as Alan says) just going to move the cost elsewhere.

Flushing bindings really messes up profiling because things that should take no time at all now incur costs with some randomness. There is a cost there though, and it is larger and more prevalent than is commonly recognized. The profiler is correct in showing a cost, but the cost is not correctly attributed to the code being profiled.

What is really needed is to avoid flushing the bindings. Consider whether it is really necessary to flush of strong binding references just because there was GC activity (and there were no code changes).

This issue affects some applications more than others due to growth and memory configuration. Memory can't be safely configured high enough to avoid these costs completely. Go much higher than the total memory allocated above "ObjectMemory sizesAtStartup: #(37.0 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like a cascade of error messages when scrolling in an application window. You won't see the flush costs with simple example code. The flush/rebind costs are easy to see in a typical GBS-based application because scavenges are more frequent (a side effect of slow finalization in weak arrays used extensively by GBS).

Paul Baumann
IntercontinentalExchange | ICE

-----Original Message-----
From: [hidden email] [ [hidden email]] On Behalf Of Henrik Johansen
Sent: Thursday, January 08, 2009 3:49 AM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Tested Martin's suggestion, no change in total runtime for a batch operation in the app I'm working on.
Though, I did have a similar problem some time ago, the solution then was to use a MemoryPolicy wity incrementalAllocationThreshold much larger than 10000 bytes. (This is for a batch operation, for which I don't really care about inc gc as long as the machine does not run out of memory, I switch back to a normal policy afterwards)

There's been quite alot of other memory-allocation improvements in the app since then, but resetting incrementalAllocationThreshold to 10000 still turns:
(1m50s runtime total)
7.4 Processes at priority 90
    7.4 BlockClosure>>on:do:
      7.4 [] in ObjectMemory class>>installScavengeNotification
        7.4 ObjectMemory class>>scavengeOccurred
          7.4 primitives
 into:
(2m6s runtime total)
16.8 Processes at priority 91
    16.8 BlockClosure>>on:do:
      16.8 [] in ObjectMemory class>>installLowSpaceProcess
        16.8 ObjectMemory class>>lowSpaceLoop
          13.6 Semaphore>>critical:
            13.6 BlockClosure>>ensure:
              13.6 [] in ObjectMemory class>>lowSpaceLoop
                13.6 BlockClosure>>ensure:
                  13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
                    13.6 AsapBatchMemoryPolicy>>lowSpaceAction
                      13.5 MemoryPolicy>>incrementalReclamation
                        13.5 MemoryPolicy>>incrementalGC
                          13.5 ObjectMemory
class>>primIncrementalGCIsInterruptable:objects:bytes:
          3.2 ResolvedDeferredBinding>>value
  2.8 Processes at priority 90
    2.8 BlockClosure>>on:do:
      2.8 [] in ObjectMemory class>>installScavengeNotification
        2.8 ObjectMemory class>>scavengeOccurred

As Martin noticed excessive lowspace time spendage, he might be experiencing something similar.

Cheers,
Henry

Alan Knight wrote:
> The problem is that that's more revealing issues with the profiler
> than it is performance penalties in the low space loop. In general, if
> something happens that can't readily be interrupted by the profiler
> (e.g. spending time in primitives) then the profiler will end up
> interrupting the first instruction that runs after that activity is
> completed. That doesn't even necessarily have to be in the same
> process that was doing the work. So the very first message send of
> lowSpaceLoop is one of those places that is likely to show up as
> consuming a large amount of time, even though, if we consider the
> matter, it's hard to see how the send of a message whose runtime is
> normally around that of a constant would actually make a difference.
> If you're profiling, it's important to make sure not only that you see
> what the profiler shows, but that changing it makes a significant
> difference. The only situation where resolving bindings is slow is
> immediately after a modifySystem:, but then it won't be the method in
> ResolvedDeferredBinding. So, for example
>
> So, if extract the first ResolvedDeferredBinding from that method, and
> in the inspector evaluate
>
> Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> , 213, 176, 203
> vs.
> Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> 112, 121, 124
>
> it doesn't seem like something that would provide a big improvement.
>
> At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>> Thank you!  AR 55983.
>>
>> Andres.
>>
>> -----Original Message-----
>> From: [hidden email] [ [hidden email]] On
>> Behalf Of Martin McClure
>> Sent: Wednesday, January 07, 2009 6:49 PM
>> To: VWNC
>> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> I was doing some profiling runs of something that was spending a lot
>> of time in garbage collection. I noticed that the LowSpaceProcess was
>> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>
>> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>> the version below, which moves the three shared variable accesses to
>> outside the loop.
>>
>> Perhaps you want to do something like this in VW7.7?
>>
>> Regards,
>>
>> -Martin
>>
>>
>> lowSpaceLoop
>>
>>         | lss lsm cmp |
>>         lss := LowSpaceSemaphore.
>>         lsm := LowSpaceMutex.
>>         cmp := CurrentMemoryPolicy.
>>         [lss wait.
>>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>> resetLowSpaceSignal]]]
>>                  repeat
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
> [hidden email] http://www.cincom.com/smalltalk
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Paul Baumann
In reply to this post by Paul Baumann
Alan,
 
Your explanation that "In general, if something happens that can't readily be interrupted by the profiler (e.g. spending time in primitives) then the profiler will end up interrupting the first instruction that runs after that activity is completed" is correct.
 
Binding sampling tests that I ran didn't show binding transitions to support a theory of flushing from GC activity. The overstated profiler measurements frequently fell on bindings but could be attributed to earlier waits and allocations. Short of reviewing VM code, this theory is dead.
 
Any way for the profiler to be improved?
 
Sorry for the noise.
 
Paul Baumann 
 


From: Alan Knight [mailto:[hidden email]]
Sent: Thursday, January 08, 2009 3:36 PM
To: Paul Baumann; 'Henrik Johansen'; VWNC
Subject: RE: [vwnc] [enh] VW7.6 Speeding up the low space process

At 12:42 PM 1/8/2009, Paul Baumann wrote:
" If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved."
 
Consider the class changes that happen in ResolvedDeferredBinding>>flushBinding and DeferredBinding>>resolveBinding. The profiler shows "ResolvedDeferredBinding>>value" but it may have been a DeferredBinding before the profiler reported the class.

That's a reasonable conjecture, but it's not what I see if I run

TimeProfiler profile: [
1000000 timesRepeat: [
a := DeferredBinding forKey: #MenuBackground.
a method: (SymbolicPaint compiledMethodAt: #=).
a value]].

There the time is attributed to DeferredBinding, not to ResolvedDeferredBinding, and it also shows me a great deal of detail about what's going on inside the resolution.

  My recollection of the details is off (that there is a flush from the GC activity itself), but there are binding costs that correlate with amount of GC activity. It could be a side effect of GC activity. Sorry, it has been many years since I looked into this. The flush was not obvious and took a lot of careful sampling before it was found. Breakpoints and such didn't help because of what was executing the code and handlers.

It would be useful to have more information about this. As it stands, I'm certainly unaware of any correlation of GC activity and binding flushing or lookup. I don't see any senders in the image that appear likely to be invoked due to GC activity. The closest thing I can see is that re-running initializers would cause bindings to be flushed. But I wouldn't expect that to be done at run-time any more than code modification.


Paul Baumann
IntercontinentalExchange | ICE
 
 

From: Alan Knight [[hidden email]]
Sent: Thursday, January 08, 2009 10:23 AM
To: Paul Baumann; 'Henrik Johansen'; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

How does scavenging flush bindings? That certainly isn't something I would have expected. And if the bindings are getting flushed, then I would expect to see time spent in resolving them. If the code being executed is in ResolvedDeferredBinding, then that means the binding was already resolved. If bindings were flushed, I'd expect to see the method show up in a different class.

Flushing the bindings (modifySystem: etc) are definitely expensive operations whose cost is not easy to observe, but I wouldn't have expected them to occur as a result of GC.

At 10:16 AM 1/8/2009, Paul Baumann wrote:
The scavenge flushes bindings that need to be resolved later. This is something that I'd observed and mentioned years ago when namespace support was first added to VW. Caching the value that the profiler needs is (as Alan says) just going to move the cost elsewhere.

Flushing bindings really messes up profiling because things that should take no time at all now incur costs with some randomness. There is a cost there though, and it is larger and more prevalent than is commonly recognized. The profiler is correct in showing a cost, but the cost is not correctly attributed to the code being profiled.

What is really needed is to avoid flushing the bindings. Consider whether it is really necessary to flush of strong binding references just because there was GC activity (and there were no code changes).

This issue affects some applications more than others due to growth and memory configuration. Memory can't be safely configured high enough to avoid these costs completely. Go much higher than the total memory allocated above "ObjectMemory sizesAtStartup: #(37.0 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like a cascade of error messages when scrolling in an application window. You won't see the flush costs with simple example code. The flush/rebind costs are easy to see in a typical GBS-based application because scavenges are more frequent (a side effect of slow finalization in weak arrays used extensively by GBS).

Paul Baumann
IntercontinentalExchange | ICE

-----Original Message-----
From: [hidden email] [ [hidden email]] On Behalf Of Henrik Johansen
Sent: Thursday, January 08, 2009 3:49 AM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Tested Martin's suggestion, no change in total runtime for a batch operation in the app I'm working on.
Though, I did have a similar problem some time ago, the solution then was to use a MemoryPolicy wity incrementalAllocationThreshold much larger than 10000 bytes. (This is for a batch operation, for which I don't really care about inc gc as long as the machine does not run out of memory, I switch back to a normal policy afterwards)

There's been quite alot of other memory-allocation improvements in the app since then, but resetting incrementalAllocationThreshold to 10000 still turns:
(1m50s runtime total)
7.4 Processes at priority 90
    7.4 BlockClosure>>on:do:
      7.4 [] in ObjectMemory class>>installScavengeNotification
        7.4 ObjectMemory class>>scavengeOccurred
          7.4 primitives
 into:
(2m6s runtime total)
16.8 Processes at priority 91
    16.8 BlockClosure>>on:do:
      16.8 [] in ObjectMemory class>>installLowSpaceProcess
        16.8 ObjectMemory class>>lowSpaceLoop
          13.6 Semaphore>>critical:
            13.6 BlockClosure>>ensure:
              13.6 [] in ObjectMemory class>>lowSpaceLoop
                13.6 BlockClosure>>ensure:
                  13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
                    13.6 AsapBatchMemoryPolicy>>lowSpaceAction
                      13.5 MemoryPolicy>>incrementalReclamation
                        13.5 MemoryPolicy>>incrementalGC
                          13.5 ObjectMemory
class>>primIncrementalGCIsInterruptable:objects:bytes:
          3.2 ResolvedDeferredBinding>>value
  2.8 Processes at priority 90
    2.8 BlockClosure>>on:do:
      2.8 [] in ObjectMemory class>>installScavengeNotification
        2.8 ObjectMemory class>>scavengeOccurred

As Martin noticed excessive lowspace time spendage, he might be experiencing something similar.

Cheers,
Henry

Alan Knight wrote:
> The problem is that that's more revealing issues with the profiler
> than it is performance penalties in the low space loop. In general, if
> something happens that can't readily be interrupted by the profiler
> (e.g. spending time in primitives) then the profiler will end up
> interrupting the first instruction that runs after that activity is
> completed. That doesn't even necessarily have to be in the same
> process that was doing the work. So the very first message send of
> lowSpaceLoop is one of those places that is likely to show up as
> consuming a large amount of time, even though, if we consider the
> matter, it's hard to see how the send of a message whose runtime is
> normally around that of a constant would actually make a difference.
> If you're profiling, it's important to make sure not only that you see
> what the profiler shows, but that changing it makes a significant
> difference. The only situation where resolving bindings is slow is
> immediately after a modifySystem:, but then it won't be the method in
> ResolvedDeferredBinding. So, for example
>
> So, if extract the first ResolvedDeferredBinding from that method, and
> in the inspector evaluate
>
> Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
> , 213, 176, 203
> vs.
> Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
> 112, 121, 124
>
> it doesn't seem like something that would provide a big improvement.
>
> At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>> Thank you!  AR 55983.
>>
>> Andres.
>>
>> -----Original Message-----
>> From: [hidden email] [ [hidden email]] On
>> Behalf Of Martin McClure
>> Sent: Wednesday, January 07, 2009 6:49 PM
>> To: VWNC
>> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> I was doing some profiling runs of something that was spending a lot
>> of time in garbage collection. I noticed that the LowSpaceProcess was
>> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>
>> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>> the version below, which moves the three shared variable accesses to
>> outside the loop.
>>
>> Perhaps you want to do something like this in VW7.7?
>>
>> Regards,
>>
>> -Martin
>>
>>
>> lowSpaceLoop
>>
>>         | lss lsm cmp |
>>         lss := LowSpaceSemaphore.
>>         lsm := LowSpaceMutex.
>>         cmp := CurrentMemoryPolicy.
>>         [lss wait.
>>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>> resetLowSpaceSignal]]]
>>                  repeat
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> _______________________________________________
>> vwnc mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
> [hidden email] http://www.cincom.com/smalltalk
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk
[hidden email]
[hidden email]
http://www.cincom.com/smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Michael Lucas-Smith-2
The alternate is to use a tracing profiler. They are, of course, a heck
of a lot slower than sampling profilers and often you get very little
bang for your buck when you use them.

Cheers,
Michael

Paul Baumann wrote:

> Alan,
>  
> Your explanation that "In general, if something happens that can't
> readily be interrupted by the profiler (e.g. spending time in
> primitives) then the profiler will end up interrupting the first
> instruction that runs after that activity is completed" is correct.
>  
> Binding sampling tests that I ran didn't show binding transitions to
> support a theory of flushing from GC activity. The overstated profiler
> measurements frequently fell on bindings but could be attributed to
> earlier waits and allocations. Short of reviewing VM code, this theory
> is dead.
>  
> Any way for the profiler to be improved?
>  
> Sorry for the noise.
>  
> *Paul Baumann*
>  
>
> ------------------------------------------------------------------------
> *From:* Alan Knight [mailto:[hidden email]]
> *Sent:* Thursday, January 08, 2009 3:36 PM
> *To:* Paul Baumann; 'Henrik Johansen'; VWNC
> *Subject:* RE: [vwnc] [enh] VW7.6 Speeding up the low space process
>
> At 12:42 PM 1/8/2009, Paul Baumann wrote:
>> " If the code being executed is in ResolvedDeferredBinding, then that
>> means the binding was already resolved."
>>  
>> Consider the class changes that happen in
>> ResolvedDeferredBinding>>flushBinding and
>> DeferredBinding>>resolveBinding. The profiler shows
>> "ResolvedDeferredBinding>>value" but it may have been a
>> DeferredBinding before the profiler reported the class.
>
> That's a reasonable conjecture, but it's not what I see if I run
>
> TimeProfiler profile: [
> 1000000 timesRepeat: [
> a := DeferredBinding forKey: #MenuBackground.
> a method: (SymbolicPaint compiledMethodAt: #=).
> a value]].
>
> There the time is attributed to DeferredBinding, not to
> ResolvedDeferredBinding, and it also shows me a great deal of detail
> about what's going on inside the resolution.
>
>>   My recollection of the details is off (that there is a flush from
>> the GC activity itself), but there are binding costs that correlate
>> with amount of GC activity. It could be a side effect of GC activity.
>> Sorry, it has been many years since I looked into this. The flush was
>> not obvious and took a lot of careful sampling before it was found.
>> Breakpoints and such didn't help because of what was executing the
>> code and handlers.
>
> It would be useful to have more information about this. As it stands,
> I'm certainly unaware of any correlation of GC activity and binding
> flushing or lookup. I don't see any senders in the image that appear
> likely to be invoked due to GC activity. The closest thing I can see
> is that re-running initializers would cause bindings to be flushed.
> But I wouldn't expect that to be done at run-time any more than code
> modification.
>
>
>> *Paul Baumann*
>> IntercontinentalExchange | ICE
>>  
>>  
>> ------------------------------------------------------------------------
>> *From:* Alan Knight [ mailto:[hidden email]]
>> *Sent:* Thursday, January 08, 2009 10:23 AM
>> *To:* Paul Baumann; 'Henrik Johansen'; VWNC
>> *Subject:* Re: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> How does scavenging flush bindings? That certainly isn't something I
>> would have expected. And if the bindings are getting flushed, then I
>> would expect to see time spent in resolving them. If the code being
>> executed is in ResolvedDeferredBinding, then that means the binding
>> was already resolved. If bindings were flushed, I'd expect to see the
>> method show up in a different class.
>>
>> Flushing the bindings (modifySystem: etc) are definitely expensive
>> operations whose cost is not easy to observe, but I wouldn't have
>> expected them to occur as a result of GC.
>>
>> At 10:16 AM 1/8/2009, Paul Baumann wrote:
>>> The scavenge flushes bindings that need to be resolved later. This
>>> is something that I'd observed and mentioned years ago when
>>> namespace support was first added to VW. Caching the value that the
>>> profiler needs is (as Alan says) just going to move the cost elsewhere.
>>>
>>> Flushing bindings really messes up profiling because things that
>>> should take no time at all now incur costs with some randomness.
>>> There is a cost there though, and it is larger and more prevalent
>>> than is commonly recognized. The profiler is correct in showing a
>>> cost, but the cost is not correctly attributed to the code being
>>> profiled.
>>>
>>> What is really needed is to avoid flushing the bindings. Consider
>>> whether it is really necessary to flush of strong binding references
>>> just because there was GC activity (and there were no code changes).
>>>
>>> This issue affects some applications more than others due to growth
>>> and memory configuration. Memory can't be safely configured high
>>> enough to avoid these costs completely. Go much higher than the
>>> total memory allocated above "ObjectMemory sizesAtStartup: #(37.0
>>> 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like
>>> a cascade of error messages when scrolling in an application window.
>>> You won't see the flush costs with simple example code. The
>>> flush/rebind costs are easy to see in a typical GBS-based
>>> application because scavenges are more frequent (a side effect of
>>> slow finalization in weak arrays used extensively by GBS).
>>>
>>> Paul Baumann
>>> IntercontinentalExchange | ICE
>>>
>>> -----Original Message-----
>>> From: [hidden email] [ mailto:[hidden email]] On
>>> Behalf Of Henrik Johansen
>>> Sent: Thursday, January 08, 2009 3:49 AM
>>> To: VWNC
>>> Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process
>>>
>>> Tested Martin's suggestion, no change in total runtime for a batch
>>> operation in the app I'm working on.
>>> Though, I did have a similar problem some time ago, the solution
>>> then was to use a MemoryPolicy wity incrementalAllocationThreshold
>>> much larger than 10000 bytes. (This is for a batch operation, for
>>> which I don't really care about inc gc as long as the machine does
>>> not run out of memory, I switch back to a normal policy afterwards)
>>>
>>> There's been quite alot of other memory-allocation improvements in
>>> the app since then, but resetting incrementalAllocationThreshold to
>>> 10000 still turns:
>>> (1m50s runtime total)
>>> 7.4 Processes at priority 90
>>>     7.4 BlockClosure>>on:do:
>>>       7.4 [] in ObjectMemory class>>installScavengeNotification
>>>         7.4 ObjectMemory class>>scavengeOccurred
>>>           7.4 primitives
>>>  into:
>>> (2m6s runtime total)
>>> 16.8 Processes at priority 91
>>>     16.8 BlockClosure>>on:do:
>>>       16.8 [] in ObjectMemory class>>installLowSpaceProcess
>>>         16.8 ObjectMemory class>>lowSpaceLoop
>>>           13.6 Semaphore>>critical:
>>>             13.6 BlockClosure>>ensure:
>>>               13.6 [] in ObjectMemory class>>lowSpaceLoop
>>>                 13.6 BlockClosure>>ensure:
>>>                   13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
>>>                     13.6 AsapBatchMemoryPolicy>>lowSpaceAction
>>>                       13.5 MemoryPolicy>>incrementalReclamation
>>>                         13.5 MemoryPolicy>>incrementalGC
>>>                           13.5 ObjectMemory
>>> class>>primIncrementalGCIsInterruptable:objects:bytes:
>>>           3.2 ResolvedDeferredBinding>>value
>>>   2.8 Processes at priority 90
>>>     2.8 BlockClosure>>on:do:
>>>       2.8 [] in ObjectMemory class>>installScavengeNotification
>>>         2.8 ObjectMemory class>>scavengeOccurred
>>>
>>> As Martin noticed excessive lowspace time spendage, he might be
>>> experiencing something similar.
>>>
>>> Cheers,
>>> Henry
>>>
>>> Alan Knight wrote:
>>> > The problem is that that's more revealing issues with the profiler
>>> > than it is performance penalties in the low space loop. In general, if
>>> > something happens that can't readily be interrupted by the profiler
>>> > (e.g. spending time in primitives) then the profiler will end up
>>> > interrupting the first instruction that runs after that activity is
>>> > completed. That doesn't even necessarily have to be in the same
>>> > process that was doing the work. So the very first message send of
>>> > lowSpaceLoop is one of those places that is likely to show up as
>>> > consuming a large amount of time, even though, if we consider the
>>> > matter, it's hard to see how the send of a message whose runtime is
>>> > normally around that of a constant would actually make a difference.
>>> > If you're profiling, it's important to make sure not only that you see
>>> > what the profiler shows, but that changing it makes a significant
>>> > difference. The only situation where resolving bindings is slow is
>>> > immediately after a modifySystem:, but then it won't be the method in
>>> > ResolvedDeferredBinding. So, for example
>>> >
>>> > So, if extract the first ResolvedDeferredBinding from that method, and
>>> > in the inspector evaluate
>>> >
>>> > Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
>>> > , 213, 176, 203
>>> > vs.
>>> > Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
>>> > 112, 121, 124
>>> >
>>> > it doesn't seem like something that would provide a big improvement.
>>> >
>>> > At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>>> >> Thank you!  AR 55983.
>>> >>
>>> >> Andres.
>>> >>
>>> >> -----Original Message-----
>>> >> From: [hidden email] [ mailto:[hidden email]] On
>>> >> Behalf Of Martin McClure
>>> >> Sent: Wednesday, January 07, 2009 6:49 PM
>>> >> To: VWNC
>>> >> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>> >>
>>> >> I was doing some profiling runs of something that was spending a lot
>>> >> of time in garbage collection. I noticed that the LowSpaceProcess was
>>> >> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>> >>
>>> >> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>>> >> the version below, which moves the three shared variable accesses to
>>> >> outside the loop.
>>> >>
>>> >> Perhaps you want to do something like this in VW7.7?
>>> >>
>>> >> Regards,
>>> >>
>>> >> -Martin
>>> >>
>>> >>
>>> >> lowSpaceLoop
>>> >>
>>> >>         | lss lsm cmp |
>>> >>         lss := LowSpaceSemaphore.
>>> >>         lsm := LowSpaceMutex.
>>> >>         cmp := CurrentMemoryPolicy.
>>> >>         [lss wait.
>>> >>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>>> >> resetLowSpaceSignal]]]
>>> >>                  repeat
>>> >> _______________________________________________
>>> >> vwnc mailing list
>>> >> [hidden email]
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>> >>
>>> >> _______________________________________________
>>> >> vwnc mailing list
>>> >> [hidden email]
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>> >
>>> > --
>>> > Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
>>> > [hidden email] http://www.cincom.com/smalltalk
>>> > ----------------------------------------------------------------------
>>> > --
>>> >
>>> > _______________________________________________
>>> > vwnc mailing list
>>> > [hidden email]
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>> >
>>> _______________________________________________
>>> vwnc mailing list
>>> [hidden email]
>>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>>
>>>
>>> This message may contain confidential information and is intended
>>> for specific recipients unless explicitly noted otherwise. If you
>>> have reason to believe you are not an intended recipient of this
>>> message, please delete it and notify the sender. This message may
>>> not represent the opinion of IntercontinentalExchange, Inc. (ICE),
>>> its subsidiaries or affiliates, and does not constitute a contract
>>> or guarantee. Unencrypted electronic mail is not secure and the
>>> recipient of this message is expected to provide safeguards from
>>> viruses and pursue alternate means of communication where privacy or
>>> a binding message is desired.
>>>
>>>
>>> _______________________________________________
>>> vwnc mailing list
>>> [hidden email]
>>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> --
>> Alan Knight [|], Engineering Manager, Cincom Smalltalk
>> [hidden email]
>> [hidden email]
>> http://www.cincom.com/smalltalk
>>
>> ------------------------------------------------------------------------
>> This message may contain confidential information and is intended for
>> specific recipients unless explicitly noted otherwise. If you have
>> reason to believe you are not an intended recipient of this message,
>> please delete it and notify the sender. This message may not
>> represent the opinion of IntercontinentalExchange, Inc. (ICE), its
>> subsidiaries or affiliates, and does not constitute a contract or
>> guarantee. Unencrypted electronic mail is not secure and the
>> recipient of this message is expected to provide safeguards from
>> viruses and pursue alternate means of communication where privacy or
>> a binding message is desired.
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
>
> ------------------------------------------------------------------------
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have
> reason to believe you are not an intended recipient of this message,
> please delete it and notify the sender. This message may not represent
> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries
> or affiliates, and does not constitute a contract or guarantee.
> Unencrypted electronic mail is not secure and the recipient of this
> message is expected to provide safeguards from viruses and pursue
> alternate means of communication where privacy or a binding message is
> desired.
> ------------------------------------------------------------------------
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>  

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Martin McClure
In reply to this post by Alan Knight-2
Alan Knight wrote:

> The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop.

After taking a closer look at the profiler results, I agree that this is
a profiler thing.

However, in the past I have made significant real performance gains from
caching frequently-accessed shared variable values in temporary
variables or in instance variables. ResolvedDeferredBinding>>value isn't
terribly slow, but it is slow compared to local/instance variable
access, as shown by this code:

| foo bar result |
result := OrderedCollection new.
bar := 42.
result add: (Time microsecondsToRun: [100000000 timesRepeat: []]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [bar]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo := bar]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo :=
Object]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo :=
Processor]]).
^result

  OrderedCollection (307167 338583 502880 676248 1013983)
  OrderedCollection (316692 357736 580825 783380 1123088)
  OrderedCollection (313947 359278 548183 759741 1030358)
  OrderedCollection (299633 305865 460892 669907 1010869)

The first result shows the time for an empty loop, about 3.0ns.

The second result forces the block to be a copying block, but does no
other additional work. This adds minimal overhead, very roughly 300ps.

The third result assigns a method local to another method local, with
overhead of roughly 1.8ns as compared to the no-work copying block.

The fourth result assigns a ResolvedDeferredConstant, with an overhead
of roughly 3.8ns as compared to the no-work copying block.

The final result assigns a ResolvedDeferredBinding, with an overhead of
roughly 7ns as compared to the no-work copying block.

So accessing a ResolvedDeferredBinding is roughly four times as
expensive as accessing a local copied into a block.

Testing where foo and bar are instvars yields very similar results.

---

So, I ask myself, if it's faster in my example, and I know that I've
seen real-world speedups by caching shared variable values locally, why
does Andres' example get slower when this is done?

After playing with his example, it's fairly clear why it does not get
faster. It's spending far more time in #critical: and the #ensure: than
it is accessing shared variables, so in this case the improvement from
that is in down in the noise. The improvement is being more than offset
by other factors, possibly the replacement of a clean block
[CurrentMemoryPolicy yourself] with a copying block, and two copied
values instead of one in the containing block.

Regards,

-Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Andres Valloud-6
Martin,

Much of the apparent slowdown goes away when tightening the measurement
environment.  On Windows and on this laptop, this means turning on core
affinity.  When doing so, here are the numbers I get:

(1 to: 10) collect: [:each | self lowSpaceLoopTestA] #(890 801 801 800
800 801 799 801 800 802)

(1 to: 10) collect: [:each | self lowSpaceLoopTestB] #(890 820 816 814
813 813 817 817 816 816)

If anything, the B method appears to be just a smidge slower.

Andres.

-----Original Message-----
From: Martin McClure [mailto:[hidden email]]
Sent: Thursday, January 08, 2009 3:54 PM
To: Alan Knight
Cc: Valloud, Andres; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Alan Knight wrote:

> The problem is that that's more revealing issues with the profiler
than it is performance penalties in the low space loop.

After taking a closer look at the profiler results, I agree that this is
a profiler thing.

However, in the past I have made significant real performance gains from
caching frequently-accessed shared variable values in temporary
variables or in instance variables. ResolvedDeferredBinding>>value isn't
terribly slow, but it is slow compared to local/instance variable
access, as shown by this code:

| foo bar result |
result := OrderedCollection new.
bar := 42.
result add: (Time microsecondsToRun: [100000000 timesRepeat: []]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [bar]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo :=
bar]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo :=
Object]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo :=
Processor]]).
^result

  OrderedCollection (307167 338583 502880 676248 1013983)
  OrderedCollection (316692 357736 580825 783380 1123088)
  OrderedCollection (313947 359278 548183 759741 1030358)
  OrderedCollection (299633 305865 460892 669907 1010869)

The first result shows the time for an empty loop, about 3.0ns.

The second result forces the block to be a copying block, but does no
other additional work. This adds minimal overhead, very roughly 300ps.

The third result assigns a method local to another method local, with
overhead of roughly 1.8ns as compared to the no-work copying block.

The fourth result assigns a ResolvedDeferredConstant, with an overhead
of roughly 3.8ns as compared to the no-work copying block.

The final result assigns a ResolvedDeferredBinding, with an overhead of
roughly 7ns as compared to the no-work copying block.

So accessing a ResolvedDeferredBinding is roughly four times as
expensive as accessing a local copied into a block.

Testing where foo and bar are instvars yields very similar results.

---

So, I ask myself, if it's faster in my example, and I know that I've
seen real-world speedups by caching shared variable values locally, why
does Andres' example get slower when this is done?

After playing with his example, it's fairly clear why it does not get
faster. It's spending far more time in #critical: and the #ensure: than
it is accessing shared variables, so in this case the improvement from
that is in down in the noise. The improvement is being more than offset
by other factors, possibly the replacement of a clean block
[CurrentMemoryPolicy yourself] with a copying block, and two copied
values instead of one in the containing block.

Regards,

-Martin

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Alan Knight-2
In reply to this post by Martin McClure
Wow. If you're trying to tune at that level of detail you need to start being very careful about micro and macro performance characteristics. For one thing, be careful that running code in a workspace may have significantly different performance from running it in methods on a class.

I'm rather surprised to hear about significant real performance gains from an improvement that has to be run hundreds of millions of times to show fractions of a second improvement. Wouldn't you see more of an improvement inlining your methods into one great big one to avoid the overhead of message sends? Sending a message (#yourself) to the local results in a 20-30% overhead over just assigning it into a variable. Messages are considerably more frequent than access to classes/shareds, so eliminating them has the potential for much larger gains. :-)

I also don't understand the results listed here. It seems to me that the resolved deferred binding here shows as a bit more than half the speed of accessing a local (3rd vs. 5th). The only thing that even approaches being four times as expensive is the ResolvedDeferredBinding vs. no-op.

At 06:54 PM 1/8/2009, Martin McClure wrote:
Alan Knight wrote:

The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop.

After taking a closer look at the profiler results, I agree that this is a profiler thing.

However, in the past I have made significant real performance gains from caching frequently-accessed shared variable values in temporary variables or in instance variables. ResolvedDeferredBinding>>value isn't terribly slow, but it is slow compared to local/instance variable access, as shown by this code:

| foo bar result |
result := OrderedCollection new.
bar := 42.
result add: (Time microsecondsToRun: [100000000 timesRepeat: []]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [bar]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo := bar]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo := Object]]).
result add: (Time microsecondsToRun: [100000000 timesRepeat: [foo := Processor]]).
^result

 OrderedCollection (307167 338583 502880 676248 1013983)
 OrderedCollection (316692 357736 580825 783380 1123088)
 OrderedCollection (313947 359278 548183 759741 1030358)
 OrderedCollection (299633 305865 460892 669907 1010869)

The first result shows the time for an empty loop, about 3.0ns.

The second result forces the block to be a copying block, but does no other additional work. This adds minimal overhead, very roughly 300ps.

The third result assigns a method local to another method local, with overhead of roughly 1.8ns as compared to the no-work copying block.

The fourth result assigns a ResolvedDeferredConstant, with an overhead of roughly 3.8ns as compared to the no-work copying block.

The final result assigns a ResolvedDeferredBinding, with an overhead of roughly 7ns as compared to the no-work copying block.

So accessing a ResolvedDeferredBinding is roughly four times as expensive as accessing a local copied into a block.

Testing where foo and bar are instvars yields very similar results.

---

So, I ask myself, if it's faster in my example, and I know that I've seen real-world speedups by caching shared variable values locally, why does Andres' example get slower when this is done?

After playing with his example, it's fairly clear why it does not get faster. It's spending far more time in #critical: and the #ensure: than it is accessing shared variables, so in this case the improvement from that is in down in the noise. The improvement is being more than offset by other factors, possibly the replacement of a clean block [CurrentMemoryPolicy yourself] with a copying block, and two copied values instead of one in the containing block.

Regards,

-Martin

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Eliot Miranda-2
In reply to this post by Michael Lucas-Smith-2
Take a look at how Andreas Raab fixed the Squeak profiler.  His work correctly attributes samples to primitives by having the VM remember which method was running when the profile clock ticks, instead of allowing the image to do it and get confused because it samples only at a suspension point.  In Andreas' scheme the profile clock is tested when a primitive is activated.

To make this work in VisualWorks you'd (probably) have to disable translated primitives during the profile and modify the primitive glue to test the clock.  This would be a fair amount of work, would give you a decent profiler, but you wouldn't be profiling what you wanted to (the VM with translated primitives).

An alternative approach would be to have a proper interrupt timer and for the interrupt handler to derive the method the pc was in currently.  i.e. if it was in native code it would attribute the tick to the method under the pc.  If the pc wasn't in native code the current method would be discovered through currentStack.  This would suit VisualWorks better but you'd have to support an interrupt tick thread on all platforms you wanted to profile upon.  There's code I wrote for the Vm profiler that you could use on Unix and Windows, but on Unix one can't depend on setitimer because its frequency is too ow (50Hz).  Instead you'd have to implement a thread as I've just described on my blog.


On Thu, Jan 8, 2009 at 3:25 PM, Michael Lucas-Smith <[hidden email]> wrote:
The alternate is to use a tracing profiler. They are, of course, a heck
of a lot slower than sampling profilers and often you get very little
bang for your buck when you use them.

Cheers,
Michael

Paul Baumann wrote:
> Alan,
>
> Your explanation that "In general, if something happens that can't
> readily be interrupted by the profiler (e.g. spending time in
> primitives) then the profiler will end up interrupting the first
> instruction that runs after that activity is completed" is correct.
>
> Binding sampling tests that I ran didn't show binding transitions to
> support a theory of flushing from GC activity. The overstated profiler
> measurements frequently fell on bindings but could be attributed to
> earlier waits and allocations. Short of reviewing VM code, this theory
> is dead.
>
> Any way for the profiler to be improved?
>
> Sorry for the noise.
>
> *Paul Baumann*
>
>
> ------------------------------------------------------------------------
> *From:* Alan Knight [mailto:[hidden email]]
> *Sent:* Thursday, January 08, 2009 3:36 PM
> *To:* Paul Baumann; 'Henrik Johansen'; VWNC
> *Subject:* RE: [vwnc] [enh] VW7.6 Speeding up the low space process
>
> At 12:42 PM 1/8/2009, Paul Baumann wrote:
>> " If the code being executed is in ResolvedDeferredBinding, then that
>> means the binding was already resolved."
>>
>> Consider the class changes that happen in
>> ResolvedDeferredBinding>>flushBinding and
>> DeferredBinding>>resolveBinding. The profiler shows
>> "ResolvedDeferredBinding>>value" but it may have been a
>> DeferredBinding before the profiler reported the class.
>
> That's a reasonable conjecture, but it's not what I see if I run
>
> TimeProfiler profile: [
> 1000000 timesRepeat: [
> a := DeferredBinding forKey: #MenuBackground.
> a method: (SymbolicPaint compiledMethodAt: #=).
> a value]].
>
> There the time is attributed to DeferredBinding, not to
> ResolvedDeferredBinding, and it also shows me a great deal of detail
> about what's going on inside the resolution.
>
>>   My recollection of the details is off (that there is a flush from
>> the GC activity itself), but there are binding costs that correlate
>> with amount of GC activity. It could be a side effect of GC activity.
>> Sorry, it has been many years since I looked into this. The flush was
>> not obvious and took a lot of careful sampling before it was found.
>> Breakpoints and such didn't help because of what was executing the
>> code and handlers.
>
> It would be useful to have more information about this. As it stands,
> I'm certainly unaware of any correlation of GC activity and binding
> flushing or lookup. I don't see any senders in the image that appear
> likely to be invoked due to GC activity. The closest thing I can see
> is that re-running initializers would cause bindings to be flushed.
> But I wouldn't expect that to be done at run-time any more than code
> modification.
>
>
>> *Paul Baumann*
>> IntercontinentalExchange | ICE
>>
>>
>> ------------------------------------------------------------------------
>> *From:* Alan Knight [ mailto:[hidden email]]
>> *Sent:* Thursday, January 08, 2009 10:23 AM
>> *To:* Paul Baumann; 'Henrik Johansen'; VWNC
>> *Subject:* Re: [vwnc] [enh] VW7.6 Speeding up the low space process
>>
>> How does scavenging flush bindings? That certainly isn't something I
>> would have expected. And if the bindings are getting flushed, then I
>> would expect to see time spent in resolving them. If the code being
>> executed is in ResolvedDeferredBinding, then that means the binding
>> was already resolved. If bindings were flushed, I'd expect to see the
>> method show up in a different class.
>>
>> Flushing the bindings (modifySystem: etc) are definitely expensive
>> operations whose cost is not easy to observe, but I wouldn't have
>> expected them to occur as a result of GC.
>>
>> At 10:16 AM 1/8/2009, Paul Baumann wrote:
>>> The scavenge flushes bindings that need to be resolved later. This
>>> is something that I'd observed and mentioned years ago when
>>> namespace support was first added to VW. Caching the value that the
>>> profiler needs is (as Alan says) just going to move the cost elsewhere.
>>>
>>> Flushing bindings really messes up profiling because things that
>>> should take no time at all now incur costs with some randomness.
>>> There is a cost there though, and it is larger and more prevalent
>>> than is commonly recognized. The profiler is correct in showing a
>>> cost, but the cost is not correctly attributed to the code being
>>> profiled.
>>>
>>> What is really needed is to avoid flushing the bindings. Consider
>>> whether it is really necessary to flush of strong binding references
>>> just because there was GC activity (and there were no code changes).
>>>
>>> This issue affects some applications more than others due to growth
>>> and memory configuration. Memory can't be safely configured high
>>> enough to avoid these costs completely. Go much higher than the
>>> total memory allocated above "ObjectMemory sizesAtStartup: #(37.0
>>> 185.0 7.0 2.0 2.0 2.0)." and you start to get window problems--like
>>> a cascade of error messages when scrolling in an application window.
>>> You won't see the flush costs with simple example code. The
>>> flush/rebind costs are easy to see in a typical GBS-based
>>> application because scavenges are more frequent (a side effect of
>>> slow finalization in weak arrays used extensively by GBS).
>>>
>>> Paul Baumann
>>> IntercontinentalExchange | ICE
>>>
>>> -----Original Message-----
>>> From: [hidden email] [ mailto:[hidden email]] On
>>> Behalf Of Henrik Johansen
>>> Sent: Thursday, January 08, 2009 3:49 AM
>>> To: VWNC
>>> Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process
>>>
>>> Tested Martin's suggestion, no change in total runtime for a batch
>>> operation in the app I'm working on.
>>> Though, I did have a similar problem some time ago, the solution
>>> then was to use a MemoryPolicy wity incrementalAllocationThreshold
>>> much larger than 10000 bytes. (This is for a batch operation, for
>>> which I don't really care about inc gc as long as the machine does
>>> not run out of memory, I switch back to a normal policy afterwards)
>>>
>>> There's been quite alot of other memory-allocation improvements in
>>> the app since then, but resetting incrementalAllocationThreshold to
>>> 10000 still turns:
>>> (1m50s runtime total)
>>> 7.4 Processes at priority 90
>>>     7.4 BlockClosure>>on:do:
>>>       7.4 [] in ObjectMemory class>>installScavengeNotification
>>>         7.4 ObjectMemory class>>scavengeOccurred
>>>           7.4 primitives
>>>  into:
>>> (2m6s runtime total)
>>> 16.8 Processes at priority 91
>>>     16.8 BlockClosure>>on:do:
>>>       16.8 [] in ObjectMemory class>>installLowSpaceProcess
>>>         16.8 ObjectMemory class>>lowSpaceLoop
>>>           13.6 Semaphore>>critical:
>>>             13.6 BlockClosure>>ensure:
>>>               13.6 [] in ObjectMemory class>>lowSpaceLoop
>>>                 13.6 BlockClosure>>ensure:
>>>                   13.6 [] in [] in ObjectMemory class>>lowSpaceLoop
>>>                     13.6 AsapBatchMemoryPolicy>>lowSpaceAction
>>>                       13.5 MemoryPolicy>>incrementalReclamation
>>>                         13.5 MemoryPolicy>>incrementalGC
>>>                           13.5 ObjectMemory
>>> class>>primIncrementalGCIsInterruptable:objects:bytes:
>>>           3.2 ResolvedDeferredBinding>>value
>>>   2.8 Processes at priority 90
>>>     2.8 BlockClosure>>on:do:
>>>       2.8 [] in ObjectMemory class>>installScavengeNotification
>>>         2.8 ObjectMemory class>>scavengeOccurred
>>>
>>> As Martin noticed excessive lowspace time spendage, he might be
>>> experiencing something similar.
>>>
>>> Cheers,
>>> Henry
>>>
>>> Alan Knight wrote:
>>> > The problem is that that's more revealing issues with the profiler
>>> > than it is performance penalties in the low space loop. In general, if
>>> > something happens that can't readily be interrupted by the profiler
>>> > (e.g. spending time in primitives) then the profiler will end up
>>> > interrupting the first instruction that runs after that activity is
>>> > completed. That doesn't even necessarily have to be in the same
>>> > process that was doing the work. So the very first message send of
>>> > lowSpaceLoop is one of those places that is likely to show up as
>>> > consuming a large amount of time, even though, if we consider the
>>> > matter, it's hard to see how the send of a message whose runtime is
>>> > normally around that of a constant would actually make a difference.
>>> > If you're profiling, it's important to make sure not only that you see
>>> > what the profiler shows, but that changing it makes a significant
>>> > difference. The only situation where resolving bindings is slow is
>>> > immediately after a modifySystem:, but then it won't be the method in
>>> > ResolvedDeferredBinding. So, for example
>>> >
>>> > So, if extract the first ResolvedDeferredBinding from that method, and
>>> > in the inspector evaluate
>>> >
>>> > Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214
>>> > , 213, 176, 203
>>> > vs.
>>> > Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133,
>>> > 112, 121, 124
>>> >
>>> > it doesn't seem like something that would provide a big improvement.
>>> >
>>> > At 09:57 PM 1/7/2009, Valloud, Andres wrote:
>>> >> Thank you!  AR 55983.
>>> >>
>>> >> Andres.
>>> >>
>>> >> -----Original Message-----
>>> >> From: [hidden email] [ mailto:[hidden email]] On
>>> >> Behalf Of Martin McClure
>>> >> Sent: Wednesday, January 07, 2009 6:49 PM
>>> >> To: VWNC
>>> >> Subject: [vwnc] [enh] VW7.6 Speeding up the low space process
>>> >>
>>> >> I was doing some profiling runs of something that was spending a lot
>>> >> of time in garbage collection. I noticed that the LowSpaceProcess was
>>> >> spending a full 1/3 of its time in ResolvedDeferredBinding>>value.
>>> >>
>>> >> I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with
>>> >> the version below, which moves the three shared variable accesses to
>>> >> outside the loop.
>>> >>
>>> >> Perhaps you want to do something like this in VW7.7?
>>> >>
>>> >> Regards,
>>> >>
>>> >> -Martin
>>> >>
>>> >>
>>> >> lowSpaceLoop
>>> >>
>>> >>         | lss lsm cmp |
>>> >>         lss := LowSpaceSemaphore.
>>> >>         lsm := LowSpaceMutex.
>>> >>         cmp := CurrentMemoryPolicy.
>>> >>         [lss wait.
>>> >>         lsm critical: [[cmp lowSpaceAction] ensure: [self
>>> >> resetLowSpaceSignal]]]
>>> >>                  repeat
>>> >> _______________________________________________
>>> >> vwnc mailing list
>>> >> [hidden email]
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>> >>
>>> >> _______________________________________________
>>> >> vwnc mailing list
>>> >> [hidden email]
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>> >
>>> > --
>>> > Alan Knight [|], Engineering Manager, Cincom Smalltalk [hidden email]
>>> > [hidden email] http://www.cincom.com/smalltalk
>>> > ----------------------------------------------------------------------
>>> > --
>>> >
>>> > _______________________________________________
>>> > vwnc mailing list
>>> > [hidden email]
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>> >
>>> _______________________________________________
>>> vwnc mailing list
>>> [hidden email]
>>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>>
>>>
>>> This message may contain confidential information and is intended
>>> for specific recipients unless explicitly noted otherwise. If you
>>> have reason to believe you are not an intended recipient of this
>>> message, please delete it and notify the sender. This message may
>>> not represent the opinion of IntercontinentalExchange, Inc. (ICE),
>>> its subsidiaries or affiliates, and does not constitute a contract
>>> or guarantee. Unencrypted electronic mail is not secure and the
>>> recipient of this message is expected to provide safeguards from
>>> viruses and pursue alternate means of communication where privacy or
>>> a binding message is desired.
>>>
>>>
>>> _______________________________________________
>>> vwnc mailing list
>>> [hidden email]
>>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>> --
>> Alan Knight [|], Engineering Manager, Cincom Smalltalk
>> [hidden email]
>> [hidden email]
>> http://www.cincom.com/smalltalk
>>
>> ------------------------------------------------------------------------
>> This message may contain confidential information and is intended for
>> specific recipients unless explicitly noted otherwise. If you have
>> reason to believe you are not an intended recipient of this message,
>> please delete it and notify the sender. This message may not
>> represent the opinion of IntercontinentalExchange, Inc. (ICE), its
>> subsidiaries or affiliates, and does not constitute a contract or
>> guarantee. Unencrypted electronic mail is not secure and the
>> recipient of this message is expected to provide safeguards from
>> viruses and pursue alternate means of communication where privacy or
>> a binding message is desired.
>
> --
> Alan Knight [|], Engineering Manager, Cincom Smalltalk
> [hidden email]
> [hidden email]
> http://www.cincom.com/smalltalk
>
> ------------------------------------------------------------------------
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have
> reason to believe you are not an intended recipient of this message,
> please delete it and notify the sender. This message may not represent
> the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries
> or affiliates, and does not constitute a contract or guarantee.
> Unencrypted electronic mail is not secure and the recipient of this
> message is expected to provide safeguards from viruses and pursue
> alternate means of communication where privacy or a binding message is
> desired.
> ------------------------------------------------------------------------
>
> _______________________________________________
> vwnc mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc


_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Martin McClure
In reply to this post by Martin McClure
Alan Knight wrote:
> Wow. If you're trying to tune at that level of detail you need to start
> being very careful about micro and macro performance characteristics.
> For one thing, be careful that running code in a workspace may have
> significantly different performance from running it in methods on a class.

In this case, I tried it both ways, with very close results.

>
> I'm rather surprised to hear about significant real performance gains
> from an improvement that has to be run hundreds of millions of times to
> show fractions of a second improvement.

Some of our stuff is indeed run hundreds of millions of times, in an
environment where milliseconds matter.

> Wouldn't you see more of an
> improvement inlining your methods into one great big one to avoid the
> overhead of message sends? Sending a message (#yourself) to the local
> results in a 20-30% overhead over just assigning it into a variable.
> Messages are considerably more frequent than access to classes/shareds,
> so eliminating them has the potential for much larger gains. :-)

We've gotten to the point where elimination of a single message send in
the most-frequently-executed portion of our code results in a small but
measurable improvement in overall performance. So we have been
eliminating message sends wherever reasonable. We've done some hand
inlining of methods, which helped quite a bit, but we don't really want
one big method. :-)

Each ResolvedDeferredBinding access costs two message sends. And we had
a couple of those that were *very* frequently accessed. Caching those
led to a fairly nice performance gain.

>
> I also don't understand the results listed here. It seems to me that the
> resolved deferred binding here shows as a bit more than half the speed
> of accessing a local (3rd vs. 5th). The only thing that even approaches
> being four times as expensive is the ResolvedDeferredBinding vs. no-op.
>

The first two tests are there to see what a do-nothing loop costs. What
I want to compare is the cost to assign a local to a local vs. the cost
to assign a shared variable to a local. In order to do that accurately,
I need to subtract the overhead of the loop. Since the loop itself is
more than half the total cost of the local-to-local test, whether or not
you subtract the loop costs really makes a difference to the results.

Regards,

-Martin
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
Reply | Threaded
Open this post in threaded view
|

Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Holger Guhl
In reply to this post by Andres Valloud-6
Flushing bindings happens always after compiling/adding classes. See SystemUtils class>>linkSystem, sent by #modifySystem:. So if the process that forces the system into a low space situation is adding classes, it is pretty normal that all bindings are flushed.
Just an idea for a scenario that might impose these troubles...
Holger Guhl
-- 
Senior Consultant * Certified Scrum Master * [hidden email]
Tel: +49 231 9 75 99 21 * Fax: +49 231 9 75 99 20
Georg Heeg eK Dortmund
Handelsregister: Amtsgericht Dortmund  A 12812

Valloud, Andres schrieb:
> That test doesn't include any GC activity that would flush the bindings; therefore, the test had none of the costs that Martin was avoiding for that method.
 
Ok, however we have not yet determined how that happens.  Also, if it does happen, I'd tend to think that would be an issue of its own rather than grounds for changing scoping rules for performance gain.  Since the scoping rules do not seem to add much benefit on their own, what can we do to find (or rule out) a flushed binding problem?  Do you remember any of your experiments, perhaps some code path that you saw a lot?
 
Andres.


From: Paul Baumann [[hidden email]]
Sent: Thursday, January 08, 2009 10:48 AM
To: Valloud, Andres; VWNC
Subject: RE: [vwnc] [enh] VW7.6 Speeding up the low space process

That test doesn't include any GC activity that would flush the bindings; therefore, the test had none of the costs that Martin was avoiding for that method.
 
Your test only measured closure scope differences--insignificant for this context. Note how the cost changes slightly if you add instance variables to ObjectMemory.
 
lowSpaceLoopTestC
 
 ^Time millisecondsToRun:
  [
   tmpA := LowSpaceMutex.
   tmpB := CurrentMemoryPolicy.
   3000000 timesRepeat: [tmpA critical: [[tmpB yourself] ensure: [self yourself]]]
  ]
 
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestA].
 #(1331 1335 1335 1343 1337 1334 1349 1339 1336 1336) average 1337.5
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestB].
 #(1862 1859 1873 1856 1867 1867 1865 1862 1868 1863) average 1864.2
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestC].
 #(1581 1563 1574 1568 1568 1568 1570 1570 1583 1570) average 1571.5
Looks like I need to request a faster computer. Your numbers were much better than this old 3GHz Pentium D. :)
 
The very first time I ran the first test was just after the schema change. Note the first number is higher than the rest.
 
(1 to: 10) collect: [:each | ObjectMemory current lowSpaceLoopTestA].
 #(1630 1347 1363 1350 1346 1352 1352 1352 1357 1349) average 1379.8
 
That first number shows a single rebinding cost (not from a GC related flush though). That performance hit gets incurred when there is GC activity that flushes bindings. The hit is random in application use. It was justified in this case due to the schema change for the test, but seems unnecessary for GC activity. Again, the flush correlates with GC activity and happens in ways that are not obvious.
 
Paul Baumann 
IntercontinentalExchange | ICE
 


From: [hidden email] [[hidden email]] On Behalf Of Valloud, Andres
Sent: Thursday, January 08, 2009 12:17 PM
To: VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

Hello,
 
I had been working on similar things recently, and I got curious about this.  The method lowSpaceLoop is on the class side of ObjectMemory, so I decided to run the following expressions in a class browser where "self" is bound to be ObjectMemory class.
 
Time millisecondsToRun:
    [
        1000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]
    ] 311

Time millisecondsToRun:
    [
        | lsm cmp |
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        1000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
    ] 343
 
 
So at first sight there's reason to think that loading the shared variables into a temporary variable inside a closure makes the code run slower by about 10% than when it simply references the shared variable.  I also thought it would be a good idea to make methods and run the test in this manner.
 
 
lowSpaceLoopTestA
 
 ^Time millisecondsToRun:
     [3000000 timesRepeat: [LowSpaceMutex critical: [[CurrentMemoryPolicy yourself] ensure: [self yourself]]]]
 
 
lowSpaceLoopTestB
 
 ^Time millisecondsToRun:
  [
   | lsm cmp |
   lsm := LowSpaceMutex.
   cmp := CurrentMemoryPolicy.
   3000000 timesRepeat: [lsm critical: [[cmp yourself] ensure: [self yourself]]]
  ]
 
 
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestA] #(781 729 732 732 727 732 727 728 728 734)
 
(1 to: 10) collect: [:each | self lowSpaceLoopTestB] #(890 829 828 827 827 823 832 827 826 827)
 
 
So indeed, the B method is slower.
 
Andres.


From: Alan Knight [[hidden email]]
Sent: Wednesday, January 07, 2009 7:31 PM
To: Valloud, Andres; VWNC
Subject: Re: [vwnc] [enh] VW7.6 Speeding up the low space process

The problem is that that's more revealing issues with the profiler than it is performance penalties in the low space loop. In general, if something happens that can't readily be interrupted by the profiler (e.g. spending time in primitives) then the profiler will end up interrupting the first instruction that runs after that activity is completed. That doesn't even necessarily have to be in the same process that was doing the work. So the very first message send of  lowSpaceLoop is one of those places that is likely to show up as consuming a large amount of time, even though, if we consider the matter, it's hard to see how the send of a message whose runtime is normally around that of a constant would actually make a difference. If you're profiling, it's important to make sure not only that you see what the profiler shows, but that changing it makes a significant difference. The only situation where resolving bindings is slow is immediately after a modifySystem:, but then it won't be the method in ResolvedDeferredBinding. So, for example

So, if extract the first ResolvedDeferredBinding from that method, and in the inspector evaluate

Time millisecondsToRun: [10000000 timesRepeat: [self value]]    => 214 , 213, 176, 203
vs.
Time millisecondsToRun: [10000000 timesRepeat: [3 value]]  =>   133, 112, 121, 124

it doesn't seem like something that would provide a big improvement.

At 09:57 PM 1/7/2009, Valloud, Andres wrote:
Thank you!  AR 55983.

Andres.

-----Original Message-----
From: [hidden email] [[hidden email]] On
Behalf Of Martin McClure
Sent: Wednesday, January 07, 2009 6:49 PM
To: VWNC
Subject: [vwnc] [enh] VW7.6 Speeding up the low space process

I was doing some profiling runs of something that was spending a lot of
time in garbage collection. I noticed that the LowSpaceProcess was
spending a full 1/3 of its time in ResolvedDeferredBinding>>value.

I eliminated this by replacing ObjectMemory class>>lowSpaceLoop with the
version below, which moves the three shared variable accesses to outside
the loop.

Perhaps you want to do something like this in VW7.7?

Regards,

-Martin


lowSpaceLoop
        
        | lss lsm cmp |
        lss := LowSpaceSemaphore.
        lsm := LowSpaceMutex.
        cmp := CurrentMemoryPolicy.
        [lss wait.
        lsm critical: [[cmp lowSpaceAction] ensure: [self
resetLowSpaceSignal]]]
                 repeat
_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

--
Alan Knight [|], Engineering Manager, Cincom Smalltalk


This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of IntercontinentalExchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.

_______________________________________________ vwnc mailing list [hidden email] http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

_______________________________________________
vwnc mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
12