Smalltalk › Frameworks & Tools › Seaside › Seaside General

Re: How to profile a server image?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Andreas.Raab

Re: How to profile a server image?

[Apologies for cross-posting but I figured some of the more interested
audience may be on the Seaside list. What follows below is a summary of
my experiences with server profiling over the last couple of years which
I've done a goodly bit of]

Herbert König wrote:
> say I have a web server with several users logged into different
> sites.
>
> What do i have to start a MessageTally on to see where the image
> spends its time?

This is tricky with MessageTally. There are several issues to keep in
mind, some of which you can fix, some of which you can't. Here we go:

1) Observing all Processes
--------------------------
First, MessageTally only observes the block you are running, not the
entire system. For example, if you were to profile this:

done := Semaphore new.
worker:= [
10 timesRepeat:[World changed; displayWorld].
done signal.
] fork.
MessageTally spyOn:[done wait].

This is what you'd get:

**Tree**
100.0% {2367ms} primitives

**Leaves**
100.0% {2367ms} UndefinedObject>>DoIt

Obviously it's not actually measuring what is going on during the period
of time which is a real problem if you are trying to measure server load
in general. But it's fixable.

2) Priority issues
------------------
MessageTally uses userInterruptPriority and won't see anything running
(for example) at lowIOPriority or higher. This is quite common for
server images for handling incoming requests etc.

The good news is, both of the above issues can be fixed easily. Just
file in the attached patch for MessageTally (I will also post this on
Mantis) and it will report the work being done accurately when you use
it via:

MessageTally spyAt: Processor timingPriority-1 on:[done wait].

The priority argument allows you to specify at which priority to run
your spy so that you can measure what you need to measure (a word of
warning though: You mustn't try to run at #timingPriority as this will
interfere with the timing mechanism itself).

With these changes you can run a tally on, e.g.,

MessageTally spyOn:[(Delay forSeconds: 10) wait].

and see what happens over the next 10 seconds. This is effectively what
we do for profiling on our server images.

3) VM timing problems
---------------------
Unfortunately, recent VMs have a problem with spy accuracy since an
important timer check was removed from the VM (I didn't notice that at
the time it happened). This *heavily* affects the profiler, for example,
try this:

MessageTally spyOn:[
1 to: 100 do:[:i|
Display fillWhite.
'Hello', ' World'.
].
].

If you have a recent VM (I haven't tracked how far back this goes) the
result is this:

**Tree**
100.0% {534ms} ByteString(SequenceableCollection)>>,

**Leaves**
100.0% {534ms} ByteString(SequenceableCollection)>>,

The problem is caused by the VM no longer doing a timer check after a
primitive has been run. Consequently, the time spent in any long-running
primitive (like BitBlt) is allocated to the *next real send* when an
interrupt check occurs (as you can see in the above). This can be
extremely confusing. If you want even remotely accurate profiles I
recommend going back a few generations in VMs and pick one that behaves
properly in this regard. For example, running the above in 3.6 I get:

**Tree**
100.0% {218ms} DisplayScreen(DisplayMedium)>>fillWhite
100.0% {218ms} DisplayScreen(DisplayMedium)>>fill:fillColor:
100.0% {218ms} DisplayScreen(Form)>>fill:rule:fillColor:
100.0% {218ms} BitBlt>>copy:from:in:fillColor:rule:

**Leaves**
100.0% {218ms} BitBlt>>copy:from:in:fillColor:rule:

Obviously, this is the correct tally. [Hm ... but this points out
another issue. For some reason 3.6 runs this in half the total time???
What the heck... I'll have to look at this]

You *must* use a VM that doesn't have the above problem or the results
will be completely confusing.

4) Measuring load with (sub-)msecs responses
--------------------------------------------
Lastly, server images can have a particular issue if they are set up to
do relatively fast request handling. For example, our routers generally
act in sub-msecs responses. When you start putting load on servers that
react that quickly, your whole msec-based sampling approach goes out of
the window.

What's worse than that is that if (some of) your server activities are
driven by the clock (using Delay etc) running under load can produce
*utter* garbage result. Another example:

blamedProcess := [
[true] whileTrue:[(Delay forMilliseconds: 1) wait].
] forkAt: Processor userInterruptPriority.

[MessageTally spyAt: Processor timingPriority -1 on:[
(Delay forSeconds: 5) wait.
]] ensure:[blamedProcess terminate]

When you run this, MessageTally will tell you that 100% CPU was spent in
Delay wait for the blamedProcess. If you look at CPU utilization you
should see practically no CPU activity. How come? When the timer event
fires, it puts both message tally's profiling process as well as the
blamed process in the runnable queue. MessageTally (being at higher
priority) will go first and see that the process that is currently
active is the blamed process. It will assume that the entire last msec
was spent in it since it has no other timing base to use.

Be wary that the above can happen in many, many variations and depends
heavily on your system's timer accuracy (if you want to play with this,
increase the blamedProcess' wait time step by step and compare the
reported CPU utilization various platforms). In general, when you see
Delay wait show up in your tallies you can be sure there is something
wrong with the measure.

Unfortunately, there is nothing much you can do about this other than be
aware of it and wait. Because we ran into the same issues I have written
a better profiler at Qwaq (which is both sub-msecs as well as
primitive-aware) and I think we'll be releasing the necessary support
for this in Cog but this will still take a while. For comparison, when
you run our profiler on the above code you get:

Reporting - 45,975 tallies, 5,000 msec.

**Tree**
98.9 (4,945) ProcessorScheduler class startUp
98.9 (4,945) ProcessorScheduler class idleProcess
98.9 (4,945) ProcessorScheduler class
relinquishProcessorForMicroseconds:

**Leaves**
98.9 (4,945) ProcessorScheduler class relinquishProcessorForMicroseconds:

So it correctly reports that the code was doing nothing (executing the
idle process).

5) Summary
----------
In order to profile your server you definitely need to take of 1)
through 3) or else you won't be able to find out anything. With these
issues taken care of, just add a remotely executable action that runs
message tally for the given number of seconds (we have this set up on
our servers so that when you go to /profile?secs=xxx it runs the
profiler for that amount of time and spits out the result).

Unless you do work that has very fast responses and is designed to work
with high load this should be good enough to get you a pretty good
understanding of what is going on.

Hope this helps!

Cheers,
- Andreas

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

MessageTallyEnh-ar.1.cs (2K) Download Attachment

johnmci

Re: [squeak-dev] Re: How to profile a server image?

Let me share one more irritation fix.

If you have one of those nice 35 inch or so LCD screens, then having a
much wider message tally report
is rather nice.

MessageTally>>defaultMaxTabs
"Return the default number of tabs after which leading white space is
compressed"
^40

On 28-Feb-09, at 10:45 AM, Andreas Raab wrote:
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

johnmci

Re: [squeak-dev] Re: How to profile a server image?

In reply to this post by Andreas.Raab

On 28-Feb-09, at 10:45 AM, Andreas Raab wrote:

> ---------------------
> Unfortunately, recent VMs have a problem with spy accuracy since an
> important timer check was removed from the VM (I didn't notice that
> at the time it happened). This *heavily* affects the profiler, for
> example, try this:
>
> MessageTally spyOn:[
> 1 to: 100 do:[:i|
> Display fillWhite.
> 'Hello', ' World'.
> ].
> ].

Ah you mean perhaps the removal of the checkForInterrupts() which
*could/might* occur after the call in

dispatchFunctionPointer

I'm just guessing of course, mind adding in a clock check on every
primitive call will result in a small amount of overhead.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

johnmci

Re: [squeak-dev] Re: How to profile a server image?

Oops you CANN'T just call checkForInterrupts from
dispatchFunctionPointer that makes squirrel food of object memory
later in time (no idea why).

The safe way is to get the before and after time, then if they are
different then call forceInterruptCheck

On 28-Feb-09, at 3:34 PM, John M McIntosh wrote:

>
> On 28-Feb-09, at 10:45 AM, Andreas Raab wrote:
>
>> ---------------------
>> Unfortunately, recent VMs have a problem with spy accuracy since an
>> important timer check was removed from the VM (I didn't notice that
>> at the time it happened). This *heavily* affects the profiler, for
>> example, try this:
>>
>> MessageTally spyOn:[
>> 1 to: 100 do:[:i|
>> Display fillWhite.
>> 'Hello', ' World'.
>> ].
>> ].
>
>
> Ah you mean perhaps the removal of the checkForInterrupts() which
> *could/might* occur after the call in
>
> dispatchFunctionPointer
>
> I'm just guessing of course, mind adding in a clock check on every
> primitive call will result in a small amount of overhead.
>
>
> --
> =
> =
> =
> =
> =
> ======================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://
> www.smalltalkconsulting.com
> =
> =
> =
> =
> =
> ======================================================================
>
>
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside

johnmci

Re: [squeak-dev] Re: How to profile a server image?

Mmmm guessing shows that ensuring forceInterruptCheck is called
after dispatchFunctionPointer DOESN'T make the profile correct.
So I'll wait for the true answer...

On 28-Feb-09, at 4:35 PM, John M McIntosh wrote:

> Oops you CANN'T just call checkForInterrupts from
> dispatchFunctionPointer that makes squirrel food of object memory
> later in time (no idea why).
>
> The safe way is to get the before and after time, then if they are
> different then call forceInterruptCheck
>

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

_______________________________________________
seaside mailing list
[hidden email]
http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside