Smalltalk › Squeak › Squeak - Dev

[squeak-dev] How to profile a server image?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

47 messages Options

123

Herbert König

[squeak-dev] How to profile a server image?

Hi,

say I have a web server with several users logged into different
sites.

What do i have to start a MessageTally on to see where the image
spends its time?

Thanks,

Herbert

Andreas.Raab

[squeak-dev] Re: How to profile a server image?

[Apologies for cross-posting but I figured some of the more interested
audience may be on the Seaside list. What follows below is a summary of
my experiences with server profiling over the last couple of years which
I've done a goodly bit of]

Herbert König wrote:
> say I have a web server with several users logged into different
> sites.
>
> What do i have to start a MessageTally on to see where the image
> spends its time?

This is tricky with MessageTally. There are several issues to keep in
mind, some of which you can fix, some of which you can't. Here we go:

1) Observing all Processes
--------------------------
First, MessageTally only observes the block you are running, not the
entire system. For example, if you were to profile this:

done := Semaphore new.
worker:= [
10 timesRepeat:[World changed; displayWorld].
done signal.
] fork.
MessageTally spyOn:[done wait].

This is what you'd get:

**Tree**
100.0% {2367ms} primitives

**Leaves**
100.0% {2367ms} UndefinedObject>>DoIt

Obviously it's not actually measuring what is going on during the period
of time which is a real problem if you are trying to measure server load
in general. But it's fixable.

2) Priority issues
------------------
MessageTally uses userInterruptPriority and won't see anything running
(for example) at lowIOPriority or higher. This is quite common for
server images for handling incoming requests etc.

The good news is, both of the above issues can be fixed easily. Just
file in the attached patch for MessageTally (I will also post this on
Mantis) and it will report the work being done accurately when you use
it via:

MessageTally spyAt: Processor timingPriority-1 on:[done wait].

The priority argument allows you to specify at which priority to run
your spy so that you can measure what you need to measure (a word of
warning though: You mustn't try to run at #timingPriority as this will
interfere with the timing mechanism itself).

With these changes you can run a tally on, e.g.,

MessageTally spyOn:[(Delay forSeconds: 10) wait].

and see what happens over the next 10 seconds. This is effectively what
we do for profiling on our server images.

3) VM timing problems
---------------------
Unfortunately, recent VMs have a problem with spy accuracy since an
important timer check was removed from the VM (I didn't notice that at
the time it happened). This *heavily* affects the profiler, for example,
try this:

MessageTally spyOn:[
1 to: 100 do:[:i|
Display fillWhite.
'Hello', ' World'.
].
].

If you have a recent VM (I haven't tracked how far back this goes) the
result is this:

**Tree**
100.0% {534ms} ByteString(SequenceableCollection)>>,

**Leaves**
100.0% {534ms} ByteString(SequenceableCollection)>>,

The problem is caused by the VM no longer doing a timer check after a
primitive has been run. Consequently, the time spent in any long-running
primitive (like BitBlt) is allocated to the *next real send* when an
interrupt check occurs (as you can see in the above). This can be
extremely confusing. If you want even remotely accurate profiles I
recommend going back a few generations in VMs and pick one that behaves
properly in this regard. For example, running the above in 3.6 I get:

**Tree**
100.0% {218ms} DisplayScreen(DisplayMedium)>>fillWhite
100.0% {218ms} DisplayScreen(DisplayMedium)>>fill:fillColor:
100.0% {218ms} DisplayScreen(Form)>>fill:rule:fillColor:
100.0% {218ms} BitBlt>>copy:from:in:fillColor:rule:

**Leaves**
100.0% {218ms} BitBlt>>copy:from:in:fillColor:rule:

Obviously, this is the correct tally. [Hm ... but this points out
another issue. For some reason 3.6 runs this in half the total time???
What the heck... I'll have to look at this]

You *must* use a VM that doesn't have the above problem or the results
will be completely confusing.

4) Measuring load with (sub-)msecs responses
--------------------------------------------
Lastly, server images can have a particular issue if they are set up to
do relatively fast request handling. For example, our routers generally
act in sub-msecs responses. When you start putting load on servers that
react that quickly, your whole msec-based sampling approach goes out of
the window.

What's worse than that is that if (some of) your server activities are
driven by the clock (using Delay etc) running under load can produce
*utter* garbage result. Another example:

blamedProcess := [
[true] whileTrue:[(Delay forMilliseconds: 1) wait].
] forkAt: Processor userInterruptPriority.

[MessageTally spyAt: Processor timingPriority -1 on:[
(Delay forSeconds: 5) wait.
]] ensure:[blamedProcess terminate]

When you run this, MessageTally will tell you that 100% CPU was spent in
Delay wait for the blamedProcess. If you look at CPU utilization you
should see practically no CPU activity. How come? When the timer event
fires, it puts both message tally's profiling process as well as the
blamed process in the runnable queue. MessageTally (being at higher
priority) will go first and see that the process that is currently
active is the blamed process. It will assume that the entire last msec
was spent in it since it has no other timing base to use.

Be wary that the above can happen in many, many variations and depends
heavily on your system's timer accuracy (if you want to play with this,
increase the blamedProcess' wait time step by step and compare the
reported CPU utilization various platforms). In general, when you see
Delay wait show up in your tallies you can be sure there is something
wrong with the measure.

Unfortunately, there is nothing much you can do about this other than be
aware of it and wait. Because we ran into the same issues I have written
a better profiler at Qwaq (which is both sub-msecs as well as
primitive-aware) and I think we'll be releasing the necessary support
for this in Cog but this will still take a while. For comparison, when
you run our profiler on the above code you get:

Reporting - 45,975 tallies, 5,000 msec.

**Tree**
98.9 (4,945) ProcessorScheduler class startUp
98.9 (4,945) ProcessorScheduler class idleProcess
98.9 (4,945) ProcessorScheduler class
relinquishProcessorForMicroseconds:

**Leaves**
98.9 (4,945) ProcessorScheduler class relinquishProcessorForMicroseconds:

So it correctly reports that the code was doing nothing (executing the
idle process).

5) Summary
----------
In order to profile your server you definitely need to take of 1)
through 3) or else you won't be able to find out anything. With these
issues taken care of, just add a remotely executable action that runs
message tally for the given number of seconds (we have this set up on
our servers so that when you go to /profile?secs=xxx it runs the
profiler for that amount of time and spits out the result).

Unless you do work that has very fast responses and is designed to work
with high load this should be good enough to get you a pretty good
understanding of what is going on.

Hope this helps!

Cheers,
- Andreas

MessageTallyEnh-ar.1.cs (2K) Download Attachment

Herbert König

Re: [squeak-dev] Re: How to profile a server image?

Hello Andreas,

AR> [Apologies for cross-posting but I figured some of the more interested
AR> audience may be on the Seaside list. What follows below is a summary of
AR> my experiences with server profiling over the last couple of years which
AR> I've done a goodly bit of]

big thanks! Having spent an hour or so with trying and finding my
first clues in the process browser in Socket>>waitforDataIfClosed:
I decided not to ask on the Aida list but here.

But I'll post a link to your reply on the Aida list too.

With your fix and

MessageTally spyAt: Processor timingPriority-1
on:[(Delay forSeconds: 30) wait]

I'm still confused by the result.

I'm using Aida and I open two web apps in two tabs of the web browser.
Then I just switch tabs alternating and refresh the browser.

So I'd expect both WebApps to show up in the tally but I only see the
app of which ever tab in the web browser I refresh first. The
remainder of the tally seems plausible to me except where noted in the
tally.

The server is a slow machine the client with the browser is fast and
connects via Lan. A refresh takes 5 seconds. Server uses lots of CPU
during this.

Do you think it's time to go to the Aida list, or is this still a
tally problem I have?

Squeak 3.8.2 with Win 3.10.10 VM.

Cheers,

Herbert mailto:[hidden email]

Spy Results.txt (6K) Download Attachment

johnmci

Re: [squeak-dev] Re: How to profile a server image?

In reply to this post by Andreas.Raab

Let me share one more irritation fix.

If you have one of those nice 35 inch or so LCD screens, then having a
much wider message tally report
is rather nice.

MessageTally>>defaultMaxTabs
"Return the default number of tabs after which leading white space is
compressed"
^40

On 28-Feb-09, at 10:45 AM, Andreas Raab wrote:
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

johnmci

Re: [squeak-dev] Re: How to profile a server image?

In reply to this post by Andreas.Raab

On 28-Feb-09, at 10:45 AM, Andreas Raab wrote:

> ---------------------
> Unfortunately, recent VMs have a problem with spy accuracy since an
> important timer check was removed from the VM (I didn't notice that
> at the time it happened). This *heavily* affects the profiler, for
> example, try this:
>
> MessageTally spyOn:[
> 1 to: 100 do:[:i|
> Display fillWhite.
> 'Hello', ' World'.
> ].
> ].

Ah you mean perhaps the removal of the checkForInterrupts() which
*could/might* occur after the call in

dispatchFunctionPointer

I'm just guessing of course, mind adding in a clock check on every
primitive call will result in a small amount of overhead.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

johnmci

Re: [squeak-dev] Re: How to profile a server image?

Oops you CANN'T just call checkForInterrupts from
dispatchFunctionPointer that makes squirrel food of object memory
later in time (no idea why).

The safe way is to get the before and after time, then if they are
different then call forceInterruptCheck

On 28-Feb-09, at 3:34 PM, John M McIntosh wrote:

>
> On 28-Feb-09, at 10:45 AM, Andreas Raab wrote:
>
>> ---------------------
>> Unfortunately, recent VMs have a problem with spy accuracy since an
>> important timer check was removed from the VM (I didn't notice that
>> at the time it happened). This *heavily* affects the profiler, for
>> example, try this:
>>
>> MessageTally spyOn:[
>> 1 to: 100 do:[:i|
>> Display fillWhite.
>> 'Hello', ' World'.
>> ].
>> ].
>
>
> Ah you mean perhaps the removal of the checkForInterrupts() which
> *could/might* occur after the call in
>
> dispatchFunctionPointer
>
> I'm just guessing of course, mind adding in a clock check on every
> primitive call will result in a small amount of overhead.
>
>
> --
> =
> =
> =
> =
> =
> ======================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://
> www.smalltalkconsulting.com
> =
> =
> =
> =
> =
> ======================================================================
>
>
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

johnmci

Re: [squeak-dev] Re: How to profile a server image?

Mmmm guessing shows that ensuring forceInterruptCheck is called
after dispatchFunctionPointer DOESN'T make the profile correct.
So I'll wait for the true answer...

On 28-Feb-09, at 4:35 PM, John M McIntosh wrote:

> Oops you CANN'T just call checkForInterrupts from
> dispatchFunctionPointer that makes squirrel food of object memory
> later in time (no idea why).
>
> The safe way is to get the before and after time, then if they are
> different then call forceInterruptCheck
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

Andreas.Raab

[squeak-dev] Re: How to profile a server image?

In reply to this post by Herbert König

Herbert König wrote:
> With your fix and
>
> MessageTally spyAt: Processor timingPriority-1
> on:[(Delay forSeconds: 30) wait]
>
> I'm still confused by the result.

Yes, if you are new to this game a bit of learning is required.

> I'm using Aida and I open two web apps in two tabs of the web browser.
> Then I just switch tabs alternating and refresh the browser.
>
> So I'd expect both WebApps to show up in the tally but I only see the
> app of which ever tab in the web browser I refresh first. The
> remainder of the tally seems plausible to me except where noted in the
> tally.

I can't tell you why you are seeing one app but keep in mind that if the
call structure of both apps are very similar (which it usually is since
it is being called from the same framework) the tree gets coalesced and
you would only see one call tree until the apps actually do different
things.

> The server is a slow machine the client with the browser is fast and
> connects via Lan. A refresh takes 5 seconds. Server uses lots of CPU
> during this.
>
> Do you think it's time to go to the Aida list, or is this still a
> tally problem I have?

Some of it is definitely a problem you have.

> Squeak 3.8.2 with Win 3.10.10 VM.

That is your first problem 3.10.10 VM has the tallying problem. You need
to go back in time find a VM which doesn't have the problem. Run the
example that I sent in the first message and make sure it generates the
right results.

???????????????????????????? Why does this eat so much?
61.2% {18360ms} ProcessorScheduler class>>startUp
|61.2% {18360ms} ProcessorScheduler class>>idleProcess

This simply means is that the process was to 60% idle. This is
accumulated over time so if you've waited for five seconds then did
something for another five it would show up at roughly 50% idle.

Cheers,
- Andreas

johnmci

Re: [squeak-dev] Re: How to profile a server image?

On 1-Mar-09, at 10:16 AM, Andreas Raab wrote:

> That is your first problem 3.10.10 VM has the tallying problem. You
> need to go back in time find a VM which doesn't have the problem.
> Run the example that I sent in the first message and make sure it
> generates the right results.

So what's the VM change?

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

Andreas.Raab

[squeak-dev] Re: How to profile a server image?

John M McIntosh wrote:

>
> On 1-Mar-09, at 10:16 AM, Andreas Raab wrote:
>
>> That is your first problem 3.10.10 VM has the tallying problem. You
>> need to go back in time find a VM which doesn't have the problem. Run
>> the example that I sent in the first message and make sure it
>> generates the right results.
>
>
> So what's the VM change?

primitiveResponse
"Details: Since primitives can run for a long time, we must check to
see if it is time to process a timer interrupt. However, on the Mac, the
high-resolution millisecond clock is expensive to poll. Thus, we use a
fast, low-resolution (1/60th second) clock to determine if the primitive
took enough time to justify polling the high-resolution clock. Seems
Byzantine, but Bob Arning showed that the performance of
primitive-intensive code decreased substantially if there was another
process waiting on a Delay.
One other detail: If the primitive fails, we want to postpone the timer
interrupt until just after the primitive failure code has been entered.
This is accomplished by setting the interrupt check counter to zero,
thus triggering a check for interrupts when activating the method
containing the primitive."

| timerPending startTime delta primIdx nArgs |
timerPending _ nextWakeupTick ~= 0.
timerPending ifTrue: [startTime _ self ioLowResMSecs].
DoBalanceChecks ifTrue:["check stack balance"
nArgs _ argumentCount.
primIdx _ primitiveIndex.
delta _ stackPointer - activeContext.
].
successFlag _ true.
self dispatchOn: primitiveIndex in: PrimitiveTable.
DoBalanceChecks ifTrue:[
(self balancedStack: delta afterPrimitive: primIdx withArgs: nArgs)
ifFalse:[self printUnbalancedStack: primIdx].
].
timerPending ifTrue: [
(self ioLowResMSecs ~= startTime) ifTrue: [
"primitive ran for more than a tick; check for possible timer interrupts"
((self ioMSecs bitAnd: MillisecondClockMask) >= nextWakeupTick) ifTrue: [
successFlag
ifTrue: ["process the interrupt now"
self checkForInterrupts]
ifFalse: ["process the interrupt in primtive failure code"
interruptCheckCounter _ 0]]]].
^ successFlag

Herbert König

Re: [squeak-dev] Re: How to profile a server image?

In reply to this post by Andreas.Raab

Hello Andreas,

3.7.1 VM does not have the timing problem, 3.9.1 has it.

3.7.1 VM does not start my 3.8.2 image with a running web server.
Actually it does not run any 3.8.2 image but it runs a standard
3.8-6665 full image.

Subsequently updating 3.8-6665 to 3.8.2 makes 3.7.1 VM start the image.

3.9.1 does start 3.8.2 with running Aida web server.

AR> Yes, if you are new to this game a bit of learning is required.
AR> Some of it is definitely a problem you have.
thanks a lot, I'll continue learning!

Cheers

Herbert mailto:[hidden email]

Herbert König

Re[2]: [squeak-dev] Re: How to profile a server image?

Hi,

HK> 3.7.1 VM does not start my 3.8.2 image with a running web server.
HK> Actually it does not run any 3.8.2 image but it runs a standard
HK> 3.8-6665 full image.

that was wrong, doesn't start several 3.8.2 images but I found one
with several packages loaded that starts.

Have to investigate more.

Cheers

Herbert mailto:[hidden email]

keith1y

Re: [squeak-dev] Re: How to profile a server image?

In reply to this post by Andreas.Raab

The Logging package is able to set up per process logs. I have been
planning to write a per-logged-in-user logging facility as an add on for
for seaside.

Logging has facilities for timeing things, and the time stamps in 3.10.2
have millisecond resolution. I also have some ruby code somewhere which
can trawl through logs, and generate data for plotting graphs.

Its not as clever as proper profiling, but its a potential option.

Keith

Juan Vuletich-4

Re: [squeak-dev] Re: How to profile a server image?

In reply to this post by Andreas.Raab

Hi Andreas,

Andreas Raab wrote:

> ...
> This is tricky with MessageTally. There are several issues to keep in
> mind, some of which you can fix, some of which you can't. Here we go:
>
> 1) Observing all Processes
> --------------------------
> First, MessageTally only observes the block you are running, not the
> entire system. For example, if you were to profile this:
>
> done := Semaphore new.
> worker:= [
> 10 timesRepeat:[World changed; displayWorld].
> done signal.
> ] fork.
> MessageTally spyOn:[done wait].
>
> This is what you'd get:
>
> **Tree**
> 100.0% {2367ms} primitives
>
> **Leaves**
> 100.0% {2367ms} UndefinedObject>>DoIt
>
> Obviously it's not actually measuring what is going on during the
> period of time which is a real problem if you are trying to measure
> server load in general. But it's fixable.
> ...

I found all this stuff very interesting and useful. Thanks!

WRT issue 1), what I'd really like is the tally tree to have several
roots, one for each forked process. Otherwise, the process that forks
other processes would also have their tallies added to him (which is
wrong, as it would appear to be using more time than it really did).
Fixing this, MessageTally could give a better insight on cpu usage than
#tallyCPUUsageFor:.

To do this, I'd need to find out for a certain context in the sender
chain, on which process it was running. (Then, when building the tally
tree, I could know that I need to add a new root.) Do you know how to
find out? I spent a couple of hours on this, and it seems it is not
possible...

Thanks,
Juan Vuletich

Ps. I found that the problem with long primitives first appears in your
3.9.1 VM, and that BitBlt seems to slow down a bit in 3.7.1 and even
more in 3.9.1. Are you planning to address this issues in the Windows VM?

Eliot Miranda-2

Re: [squeak-dev] Re: How to profile a server image?

Hi Juan,

On Tue, Mar 3, 2009 at 5:28 AM, Juan Vuletich <[hidden email]> wrote:

Hi Andreas,

Andreas Raab wrote:

...

This is tricky with MessageTally. There are several issues to keep in mind, some of which you can fix, some of which you can't. Here we go:

1) Observing all Processes
--------------------------
First, MessageTally only observes the block you are running, not the entire system. For example, if you were to profile this:

done := Semaphore new.
worker:= [
10 timesRepeat:[World changed; displayWorld].
done signal.
] fork.
MessageTally spyOn:[done wait].

This is what you'd get:

**Tree**
100.0% {2367ms} primitives

**Leaves**
100.0% {2367ms} UndefinedObject>>DoIt

Obviously it's not actually measuring what is going on during the period of time which is a real problem if you are trying to measure server load in general. But it's fixable.
...

I found all this stuff very interesting and useful. Thanks!

WRT issue 1), what I'd really like is the tally tree to have several roots, one for each forked process. Otherwise, the process that forks other processes would also have their tallies added to him (which is wrong, as it would appear to be using more time than it really did). Fixing this, MessageTally could give a better insight on cpu usage than #tallyCPUUsageFor:.

To do this, I'd need to find out for a certain context in the sender chain, on which process it was running. (Then, when building the tally tree, I could know that I need to add a new root.) Do you know how to find out? I spent a couple of hours on this, and it seems it is not possible...

When MessageTally runs to collect each tally the process that has been interrupted is the highest priority runnable process in the runnable process lists in ProcessorScheduler. You could implement it like this:

!ProcessorScheduler methodsFor: 'accessing' stamp: 'eem 3/3/2009 10:41'!

highestPriorityRunnableProcess

[quiescentProcessLists reverseDo:

[:each| each isEmpty ifFalse: [^each first]]] valueUnpreemptively.

^nil

"| thisProcess interruptedProcess done |

thisProcess := Processor activeProcess.

done := false.

[(Delay forSeconds: 1) wait.

interruptedProcess := Processor highestPriorityRunnableProcess.

done := true] forkAt: Processor userInterruptPriority.

[done] whileFalse.

self assert: thisProcess == interruptedProcess"! !

So modify MessageTally (or better still create a subclass called MultiProcessMessageTally) that uses the above to manage a set of tallies for each process found while spying.

HTH

Thanks,
Juan Vuletich

Ps. I found that the problem with long primitives first appears in your 3.9.1 VM, and that BitBlt seems to slow down a bit in 3.7.1 and even more in 3.9.1. Are you planning to address this issues in the Windows VM?

Juan Vuletich-4

Re: [squeak-dev] Re: How to profile a server image?

Hi Eliot,

Eliot Miranda wrote:

> Hi Juan,
>
> On Tue, Mar 3, 2009 at 5:28 AM, Juan Vuletich <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hi Andreas,
>
> Andreas Raab wrote:
>
> ...
>
> This is tricky with MessageTally. There are several issues to
> keep in mind, some of which you can fix, some of which you
> can't. Here we go:
>
> 1) Observing all Processes
> --------------------------
> First, MessageTally only observes the block you are running,
> not the entire system. For example, if you were to profile this:
>
> done := Semaphore new.
> worker:= [
> 10 timesRepeat:[World changed; displayWorld].
> done signal.
> ] fork.
> MessageTally spyOn:[done wait].
>
> This is what you'd get:
>
> **Tree**
> 100.0% {2367ms} primitives
>
> **Leaves**
> 100.0% {2367ms} UndefinedObject>>DoIt
>
> Obviously it's not actually measuring what is going on during
> the period of time which is a real problem if you are trying
> to measure server load in general. But it's fixable.
> ...
>
> I found all this stuff very interesting and useful. Thanks!
>
> WRT issue 1), what I'd really like is the tally tree to have
> several roots, one for each forked process. Otherwise, the process
> that forks other processes would also have their tallies added to
> him (which is wrong, as it would appear to be using more time than
> it really did). Fixing this, MessageTally could give a better
> insight on cpu usage than #tallyCPUUsageFor:.
>
> To do this, I'd need to find out for a certain context in the
> sender chain, on which process it was running. (Then, when
> building the tally tree, I could know that I need to add a new
> root.) Do you know how to find out? I spent a couple of hours on
> this, and it seems it is not possible...
>
>
> When MessageTally runs to collect each tally the process that has been
> interrupted is the highest priority runnable process in the runnable
> process lists in ProcessorScheduler. You could implement it like this:
>
> !ProcessorScheduler methodsFor: 'accessing' stamp: 'eem 3/3/2009 10:41'!
> highestPriorityRunnableProcess
> [quiescentProcessLists reverseDo:
> [:each| each isEmpty ifFalse: [^each first]]] valueUnpreemptively.
> ^nil
>
> "| thisProcess interruptedProcess done |
> thisProcess := Processor activeProcess.
> done := false.
> [(Delay forSeconds: 1) wait.
> interruptedProcess := Processor highestPriorityRunnableProcess.
> done := true] forkAt: Processor userInterruptPriority.
> [done] whileFalse.
> self assert: thisProcess == interruptedProcess"! !
>
> So modify MessageTally (or better still create a subclass called
> MultiProcessMessageTally) that uses the above to manage a set of
> tallies for each process found while spying.
>
> HTH
>

What you say sounds similar to what Andreas suggests to make
MessageTally spy over all processes. The issue I point out is that when
MessageTally
builds the tree, the sender of a context might be in another process.
This happens when a context forks a new process, it is still its sender.
So when building the tally tree, I need to query each context for the
process running it, and when it is different from the parent's one, I'll
start a new tally tree.

I made it work, by adding an ivar to Process to hold the firstContext
(the one that is sent to #forContext:priority:). Then, when building the
tally tree, for each context, I check if it is the firstContext of some
process. The problem with this approach is that it adds too much
overhead to the tally.

I'd really appreciate a better way to do this. I'm sure everybody will
like the multi-process tally that would result!

Cheers,
Juan Vuletich

Eliot Miranda-2

Re: [squeak-dev] Re: How to profile a server image?

On Tue, Mar 3, 2009 at 11:56 AM, Juan Vuletich <[hidden email]> wrote:

Hi Eliot,

Eliot Miranda wrote:

Hi Juan,

On Tue, Mar 3, 2009 at 5:28 AM, Juan Vuletich <[hidden email] <mailto:[hidden email]>> wrote:

Hi Andreas,

Andreas Raab wrote:

...

This is tricky with MessageTally. There are several issues to
keep in mind, some of which you can fix, some of which you
can't. Here we go:

1) Observing all Processes
--------------------------
First, MessageTally only observes the block you are running,
not the entire system. For example, if you were to profile this:

done := Semaphore new.
worker:= [
10 timesRepeat:[World changed; displayWorld].
done signal.
] fork.
MessageTally spyOn:[done wait].

This is what you'd get:

**Tree**
100.0% {2367ms} primitives

**Leaves**
100.0% {2367ms} UndefinedObject>>DoIt

Obviously it's not actually measuring what is going on during
the period of time which is a real problem if you are trying
to measure server load in general. But it's fixable.
...

I found all this stuff very interesting and useful. Thanks!

WRT issue 1), what I'd really like is the tally tree to have
several roots, one for each forked process. Otherwise, the process
that forks other processes would also have their tallies added to
him (which is wrong, as it would appear to be using more time than
it really did). Fixing this, MessageTally could give a better
insight on cpu usage than #tallyCPUUsageFor:.

To do this, I'd need to find out for a certain context in the
sender chain, on which process it was running. (Then, when
building the tally tree, I could know that I need to add a new
root.) Do you know how to find out? I spent a couple of hours on
this, and it seems it is not possible...

When MessageTally runs to collect each tally the process that has been interrupted is the highest priority runnable process in the runnable process lists in ProcessorScheduler. You could implement it like this:

!ProcessorScheduler methodsFor: 'accessing' stamp: 'eem 3/3/2009 10:41'!
highestPriorityRunnableProcess
[quiescentProcessLists reverseDo:
[:each| each isEmpty ifFalse: [^each first]]] valueUnpreemptively.
^nil

"| thisProcess interruptedProcess done |
thisProcess := Processor activeProcess.
done := false.
[(Delay forSeconds: 1) wait.
interruptedProcess := Processor highestPriorityRunnableProcess.
done := true] forkAt: Processor userInterruptPriority.
[done] whileFalse.
self assert: thisProcess == interruptedProcess"! !

So modify MessageTally (or better still create a subclass called MultiProcessMessageTally) that uses the above to manage a set of tallies for each process found while spying.

HTH

What you say sounds similar to what Andreas suggests to make MessageTally spy over all processes. The issue I point out is that when MessageTally
builds the tree, the sender of a context might be in another process. This happens when a context forks a new process, it is still its sender. So when building the tally tree, I need to query each context for the process running it, and when it is different from the parent's one, I'll start a new tally tree.

I think I see but I wouldn't put it like that. Yes, new processes get resumed in the context of other proesses, but no, the sender is not in another process. The bottom context of a process has no sender. So why bother at all trying to track down in which process a process was created and leave that to the user when interpreting the profile?

I made it work, by adding an ivar to Process to hold the firstContext (the one that is sent to #forContext:priority:). Then, when building the tally tree, for each context, I check if it is the firstContext of some process. The problem with this approach is that it adds too much overhead to the tally.

IMO just throw this away. You don't absolutely need to know on behalf of which process a process is running.

I'd really appreciate a better way to do this. I'm sure everybody will like the multi-process tally that would result!

I believe the VW multi-process profiler doesn't bother identifying the parent process. But if it does it can do so by e.g. adding an inst var to process that refers to the parent process, rather than the context in which the process was created. But both recording the context or the parent process are bad ideas for garbage collection.

I would simply use the process name facility (Process>>name[:] as displayed by the process browser) to label the various tallies you collect. If a user can't work out which process is which by just looking at the the profile ten they can change their code to use Process>>name: to add names to various processes and then be able to wrk it out.

just my 2¢

best

Cheers,
Juan Vuletich

Andreas.Raab

[squeak-dev] Re: How to profile a server image?

In reply to this post by Juan Vuletich-4

Juan Vuletich wrote:
> What you say sounds similar to what Andreas suggests to make
> MessageTally spy over all processes. The issue I point out is that when
> MessageTally
> builds the tree, the sender of a context might be in another process.
> This happens when a context forks a new process, it is still its sender.
> So when building the tally tree, I need to query each context for the
> process running it, and when it is different from the parent's one, I'll
> start a new tally tree.

I'm not sure how useful this is. When it comes to measuring where the
time goes, the cumulative tree of multiple processes shows you if you
have deficiencies in the core framework handling. This is extremely
useful. And it breaks down quite nicely into separate pieces where
actual different work is involved. I don't think you would get much more
useful information if you were to break things apart. And I can say for
sure that for the purposes of our server profiling it would be pretty
much useless - under load we run literally hundreds of processes.

Cheers,
- Andreas

Igor Stasenko

Re: [squeak-dev] Re: How to profile a server image?

Guys, can i ask you to put these recommendations on wiki page, so this
VERY USEFUL guide will not be buried in mailing list and can serve as
a basic guide for profiling in Squeak?

--
Best regards,
Igor Stasenko AKA sig.

Andreas.Raab

[squeak-dev] Re: How to profile a server image?

Igor Stasenko wrote:
> Guys, can i ask you to put these recommendations on wiki page, so this
> VERY USEFUL guide will not be buried in mailing list and can serve as
> a basic guide for profiling in Squeak?

Where would you put it? I have never found my way through the Wiki. If
it's helpful for others I'd be happy to put it up, just tell me where
you'd like it to go.

Cheers,
- Andreas

123