Profiling

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Profiling

Colin Putney-3
Hi all,

I'm trying to optimize some slow code, but having trouble getting a time profile that shows where the time is being spent. Here's the results from MessageTally class>>spyOn:

 - 31249 tallies, 31251 msec.

**Tree**
--------------------------------
Process: other processes
--------------------------------
3.9% {1233ms} EventSensor>>eventTickler
  3.9% {1231ms} Delay>>wait
--------------------------------
Process: (40s) 77070: nil
--------------------------------
95.5% {29853ms} ALReadWriteTransaction(ALTransaction)>>commit:
  95.5% {29848ms} ALReadWriteTransaction>>commit
    94.7% {29593ms} Dictionary>>keysAndValuesDo:
      94.7% {29593ms} Dictionary>>associationsDo:
**Leaves**
94.7% {29593ms} Dictionary>>associationsDo:
3.9% {1231ms} Delay>>wait

**Memory**
old +891,120 bytes
young -273,076 bytes
used +618,044 bytes
free -618,044 bytes

**GCs**
full 0 totalling 0 ms (0% uptime)
incr 56 totalling 28 ms (0.1% uptime), avg 0.5 ms
tenures 12 (avg 4 GCs/tenure)
root table 0 overflows


Obviously, Dictionary>>associationsDo: isn't really a leaf method. I suspect that Cog is generating native code for the contents of the loop and it's not getting interrupted by the tally process. (I've tried running on an interpreter VM, but it fails with FFI errors, so no help there.)

Anybody have any ideas for how to get a better profile?

Colin


Reply | Threaded
Open this post in threaded view
|

Re: Profiling

Levente Uzonyi-2
On Wed, 9 Jan 2013, Colin Putney wrote:

> Hi all,
> I'm trying to optimize some slow code, but having trouble getting a time profile that shows where the time is being spent. Here's the results from MessageTally class>>spyOn:
>
>        - 31249 tallies, 31251 msec.
>
> **Tree**
> --------------------------------
> Process: other processes
> --------------------------------
> 3.9% {1233ms} EventSensor>>eventTickler
>   3.9% {1231ms} Delay>>wait
> --------------------------------
> Process: (40s) 77070: nil
> --------------------------------
> 95.5% {29853ms} ALReadWriteTransaction(ALTransaction)>>commit:
>   95.5% {29848ms} ALReadWriteTransaction>>commit
>     94.7% {29593ms} Dictionary>>keysAndValuesDo:
>       94.7% {29593ms} Dictionary>>associationsDo:
> **Leaves**
> 94.7% {29593ms} Dictionary>>associationsDo:
> 3.9% {1231ms} Delay>>wait
>
> **Memory**
> old +891,120 bytes
> young -273,076 bytes
> used +618,044 bytes
> free -618,044 bytes
>
> **GCs**
> full 0 totalling 0 ms (0% uptime)
> incr 56 totalling 28 ms (0.1% uptime), avg 0.5 ms
> tenures 12 (avg 4 GCs/tenure)
> root table 0 overflows
>
>
>
> Obviously, Dictionary>>associationsDo: isn't really a leaf method. I suspect that Cog is generating native code for the contents of the loop and it's not getting interrupted by the tally process. (I've tried
> running on an interpreter VM, but it fails with FFI errors, so no help there.)
>
> Anybody have any ideas for how to get a better profile?
Qwaq made two profilers, one for the VM and one for the image. The
latter's port for Pharo can be found here:
http://www.squeaksource.com/SystemProfilerPharo.html
(IIRC it loads into Squeak too.)

It uses some primitives which were added to the CogVM, so it requires Cog.
We could reimplement it from scratch using the same primitives and
get a better profiler in the image. We can't include this due to the GPL
license.


Levente

>
> Colin
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Profiling

Bob Arning-2
In reply to this post by Colin Putney-3
Can you write a snippet that exhibits the same behavior, both to show us and to try on non-cog? What does #commit look like? What does the dictionary look like?

Cheers,
Bob

On 1/9/13 11:33 PM, Colin Putney wrote:
Hi all,

I'm trying to optimize some slow code, but having trouble getting a time profile that shows where the time is being spent. Here's the results from MessageTally class>>spyOn:

 - 31249 tallies, 31251 msec.

**Tree**
--------------------------------
Process: other processes
--------------------------------
3.9% {1233ms} EventSensor>>eventTickler
  3.9% {1231ms} Delay>>wait
--------------------------------
Process: (40s) 77070: nil
--------------------------------
95.5% {29853ms} ALReadWriteTransaction(ALTransaction)>>commit:
  95.5% {29848ms} ALReadWriteTransaction>>commit
    94.7% {29593ms} Dictionary>>keysAndValuesDo:
      94.7% {29593ms} Dictionary>>associationsDo:
**Leaves**
94.7% {29593ms} Dictionary>>associationsDo:
3.9% {1231ms} Delay>>wait

**Memory**
old +891,120 bytes
young -273,076 bytes
used +618,044 bytes
free -618,044 bytes

**GCs**
full 0 totalling 0 ms (0% uptime)
incr 56 totalling 28 ms (0.1% uptime), avg 0.5 ms
tenures 12 (avg 4 GCs/tenure)
root table 0 overflows


Obviously, Dictionary>>associationsDo: isn't really a leaf method. I suspect that Cog is generating native code for the contents of the loop and it's not getting interrupted by the tally process. (I've tried running on an interpreter VM, but it fails with FFI errors, so no help there.)

Anybody have any ideas for how to get a better profile?

Colin



    



Reply | Threaded
Open this post in threaded view
|

Re: Profiling

Eliot Miranda-2
In reply to this post by Colin Putney-3



On Wed, Jan 9, 2013 at 8:33 PM, Colin Putney <[hidden email]> wrote:
Hi all,

I'm trying to optimize some slow code, but having trouble getting a time profile that shows where the time is being spent. Here's the results from MessageTally class>>spyOn:

 - 31249 tallies, 31251 msec.

**Tree**
--------------------------------
Process: other processes
--------------------------------
3.9% {1233ms} EventSensor>>eventTickler
  3.9% {1231ms} Delay>>wait
--------------------------------
Process: (40s) 77070: nil
--------------------------------
95.5% {29853ms} ALReadWriteTransaction(ALTransaction)>>commit:
  95.5% {29848ms} ALReadWriteTransaction>>commit
    94.7% {29593ms} Dictionary>>keysAndValuesDo:
      94.7% {29593ms} Dictionary>>associationsDo:
**Leaves**
94.7% {29593ms} Dictionary>>associationsDo:
3.9% {1231ms} Delay>>wait

**Memory**
old +891,120 bytes
young -273,076 bytes
used +618,044 bytes
free -618,044 bytes

**GCs**
full 0 totalling 0 ms (0% uptime)
incr 56 totalling 28 ms (0.1% uptime), avg 0.5 ms
tenures 12 (avg 4 GCs/tenure)
root table 0 overflows


Obviously, Dictionary>>associationsDo: isn't really a leaf method. I suspect that Cog is generating native code for the contents of the loop and it's not getting interrupted by the tally process. (I've tried running on an interpreter VM, but it fails with FFI errors, so no help there.)

Yes, Cog will generate native code for associationsDo: and the block its running.  No, Cog won't prevent that block being interrupted by the tally process.

e.g.

| count p |
count := 0.
p := [[count < 10] whileTrue: [(Delay forMilliseconds: 50) wait. count := count + 1]] newProcess.
p priority: Processor activePriority + 1.
p resume.
[count < 10] whileTrue: [Smalltalk globals "*" associationsDo: [:nada|]].
count

Substitute a number of blocks, e.g.

| count p |
count := 0.
p := [[count < 10] whileTrue: [(Delay forMilliseconds: 50) wait. count := count + 1]] newProcess.
p priority: Processor activePriority + 1.
p resume.
[count < 10] whileTrue: [Smalltalk globals associationsDo: [:nada| | i | i := 0. [count < 10] whileTrue: [i even ifTrue: [i := i + 1] ifFalse: [i := i - 1]]]].
count

However, if all the associationsDo: block does is invoke a primitive then the Squeak profiler will not produce useful info.  The default profiler doesn't profile primitives correctly.  As others in this thread have pointed out there;s a better profiler written by Andreas Raab, which does profile primitives properly.

(* in 4.4 use e.g. (Smalltalk globals instVarNamed: 'contents'))


Anybody have any ideas for how to get a better profile?

Colin

BTW, what is the body of that block, if you can make it public? 
-- 
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Profiling

Andreas.Raab
In reply to this post by Levente Uzonyi-2
Levente Uzonyi-2 wrote
It uses some primitives which were added to the CogVM, so it requires Cog.
We could reimplement it from scratch using the same primitives and
get a better profiler in the image. We can't include this due to the GPL
license.
Or ask Ron if 3dicc (which bought all the rights to the Teleplace IP and Software) can re-release the code under a more permissable license.

Cheers,
  - Andreas
Reply | Threaded
Open this post in threaded view
|

Inclusive licensing (was: [squeak-dev] Re: Profiling)

David T. Lewis
On Fri, Jan 11, 2013 at 10:38:46AM -0800, Andreas.Raab wrote:

> Levente Uzonyi-2 wrote
> > It uses some primitives which were added to the CogVM, so it requires Cog.
> > We could reimplement it from scratch using the same primitives and
> > get a better profiler in the image. We can't include this due to the GPL
> > license.
>
> Or ask Ron if 3dicc (which bought all the rights to the Teleplace IP and
> Software) can re-release the code under a more permissable license.
>
> Cheers,
>   - Andreas
>

That sounds like a really good idea to me. I don't know the background on this
story, but I should think that allowing the Squeak community to more freely
participate in OpenQwaq would be a win for all involved. There is some very
cool stuff in QpenQwaq and it's a shame if it does not get attention due to
silly license constraints.

Anyone care to take the hint and see if something might be done here?

Dave
 

Reply | Threaded
Open this post in threaded view
|

Re: Profiling

Colin Putney-3
In reply to this post by Eliot Miranda-2



On Thu, Jan 10, 2013 at 2:08 PM, Eliot Miranda <[hidden email]> wrote:
 
However, if all the associationsDo: block does is invoke a primitive then the Squeak profiler will not produce useful info.  The default profiler doesn't profile primitives correctly.  As others in this thread have pointed out there;s a better profiler written by Andreas Raab, which does profile primitives properly.

That's the issue, alright.

Here's the code:

adds keysAndValuesDo:
[:key :vector |
repository add: key vector: vector.
vector becomeForward: nil copyHash: false].
 
I had assumed that #add:vector: was the culprit, because it does serialization, which involves walking the object graph rooted in the vector and lots of double-dispatch. I found the FFI call and commented it out, and running under the interpreter I was able to see all the serialization in the profile. It was still a tiny proportion of the total time, but without Cog to speed it up, it was at least showing up on the profile! That made me realize how much time was being spent in primitives.

The fix is easy—just batch up the the objects to become, then do them all at once with #elementsForwardIdentityTo:copyHash:. 

Thanks all, for the help, and apologies for being slow to respond. I've been without internet for the last few days.

Colin