Simple lock up with delay + semaphore - not fixed with 0006576

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Simple lock up with delay + semaphore - not fixed with 0006576

Georg Köster
Hi all,

I'm trying around with Croquet since a short while and stumbled over a bug in TPromise, but after a first fix there I also found that I could lock up my squeak (3.8.18beta1U on os x 10.5) as simply as this:

|s b c|
s := Semaphore new.
b := false.
[b := true. "forgotten/crashed therefore no: s signal." ] fork.
Delay timeoutSemaphore: s afterMSecs: 1.
s wait.
c := b.
Transcript show: c; cr.

This shows the use case. Shorter even:
|s|
s := Semaphore new.
Delay timeoutSemaphore: s afterMSecs: 1.
s wait.
Transcript show: 'doesnt get here - not printed'; cr.

I hope I'm doing something wrong here!

I pasted the fixes from 6576 and 6588 to no avail.

I have this fix for my use-case: Avoid Delay>>timeoutSempaphore:afterMSecs:
|s|
s := Semaphore new.
[(Delay forSeconds:1) wait. s signal.] fork.
s wait.
Transcript show: 'Now gets here - line will be printed'; cr.

Resource intense, I know. I would love to use your Delay mechanism!
Cheers,
Georg


Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Tom Phoenix
On Jan 4, 2008 2:58 PM, Georg Köster <[hidden email]> wrote:

> Delay timeoutSemaphore: s afterMSecs: 1.

Despite the comment in the source code, Delay
class>>timeoutSemaphore:afterMSecs: doesn't signal the given
semaphore; in fact, it never uses that parameter. That method should
be deprecated, if not simply removed. I've filed a bug report:

    http://bugs.squeak.org/view.php?id=6834

> I have this fix for my use-case: Avoid Delay>>timeoutSempaphore:afterMSecs:

> |s|
> s := Semaphore new.
> [(Delay forSeconds:1) wait. s signal.] fork.
> s wait.
> Transcript show: 'Now gets here - line will be printed'; cr.

> Resource intense, I know.

Why; because you used fork? Squeak processes don't work the way you
might think. I would recommend a solution along those exact lines.

Hope this helps!

--Tom Phoenix

Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Georg Köster
Hi Tom,

Thanks for the quick response!

I thought the semaphore gets set as the semaphore that is signaled later... well the code is complicated.

I have another question:
Semaphore>>waitTimeoutMSecs:
| d |
    d := Delay timeoutSemaphore: self afterMSecs: (anInteger max: 0).
    self wait.
    "Signaling by the Delay here would over-signal the Sem."
    d unschedule.

A fix would need a sem for the sem I think... is that method usable?

Cheers again,
Georg

On Jan 5, 2008 12:37 AM, Tom Phoenix <[hidden email]> wrote:
On Jan 4, 2008 2:58 PM, Georg Köster < [hidden email]> wrote:

> Delay timeoutSemaphore: s afterMSecs: 1.

Despite the comment in the source code, Delay
class>>timeoutSemaphore:afterMSecs: doesn't signal the given
semaphore; in fact, it never uses that parameter. That method should
be deprecated, if not simply removed. I've filed a bug report:

   http://bugs.squeak.org/view.php?id=6834

> I have this fix for my use-case: Avoid Delay>>timeoutSempaphore:afterMSecs:

> |s|
> s := Semaphore new.
> [(Delay forSeconds:1) wait. s signal.] fork.
> s wait.
> Transcript show: 'Now gets here - line will be printed'; cr.

> Resource intense, I know.

Why; because you used fork? Squeak processes don't work the way you
might think. I would recommend a solution along those exact lines.

Hope this helps!

--Tom Phoenix




Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Tom Phoenix
On Jan 4, 2008 6:57 PM, Georg Köster <[hidden email]> wrote:

> I have another question:
> Semaphore>>waitTimeoutMSecs:
> | d |
>     d := Delay timeoutSemaphore: self afterMSecs: (anInteger max: 0).
>     self wait.
>     "Signaling by the Delay here would over-signal the Sem."
>     d unschedule.
>
> A fix would need a sem for the sem I think... is that method usable?

Is that the source in your image for that method? Here's what I have:

        waitTimeoutMSecs: anInteger
                "Wait on this semaphore for up to the given number of milliseconds,
then timeout. It is up to the sender to determine the difference
between the expected event and a timeout."
       
                | waitingProcess wakeupProcess |
                waitingProcess _ Processor activeProcess.
                wakeupProcess _
                        [(Delay forMilliseconds: (anInteger max: 0)) wait.
                        self resumeProcess: waitingProcess] fork.
       
                self wait.
                wakeupProcess terminate.

Cheers!

--Tom Phoenix

Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Andreas.Raab
In reply to this post by Georg Köster
Georg Köster wrote:

> I'm trying around with Croquet since a short while and stumbled over a
> bug in TPromise, but after a first fix there I also found that I could
> lock up my squeak (3.8.18beta1U on os x 10.5) as simply as this:
>
> |s b c|
> s := Semaphore new.
> b := false.
> [b := true. "forgotten/crashed therefore no: s signal." ] fork.
> Delay timeoutSemaphore: s afterMSecs: 1.
> s wait.
> c := b.
> Transcript show: c; cr.

I am not sure what you expect this to print. In a Croquet SDK image it
prints "true" which I presume is correct?

> This shows the use case. Shorter even:
> |s|
> s := Semaphore new.
> Delay timeoutSemaphore: s afterMSecs: 1.
> s wait.
> Transcript show: 'doesnt get here - not printed'; cr.
>
> I hope I'm doing something wrong here!

You must be. This works fine in a Croquet SDK image.

> I pasted the fixes from 6576 and 6588 to no avail.

None of these would even affect this behavior.

> Resource intense, I know. I would love to use your Delay mechanism!

And you should be able to. All of the above seem to work fine with the
Croquet images that I've tried - what images are you using it with?

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Andreas.Raab
In reply to this post by Tom Phoenix
Tom Phoenix wrote:

> Is that the source in your image for that method? Here's what I have:
>
> waitTimeoutMSecs: anInteger
> "Wait on this semaphore for up to the given number of milliseconds,
> then timeout. It is up to the sender to determine the difference
> between the expected event and a timeout."
>
> | waitingProcess wakeupProcess |
> waitingProcess _ Processor activeProcess.
> wakeupProcess _
> [(Delay forMilliseconds: (anInteger max: 0)) wait.
> self resumeProcess: waitingProcess] fork.
>
> self wait.
> wakeupProcess terminate.

How odd. It seems that this version is only present in 3.9 - all other
images have a variant of the (much simpler) code:

waitTimeoutMSecs: anInteger
     "Wait on this semaphore for up to the given number of milliseconds,
then timeout. It is up to the sender to determine the difference between
the expected event and a timeout."
     | d |
     d := Delay timeoutSemaphore: self afterMSecs: (anInteger max: 0).
     [self wait] ensure:[d unschedule].

The 3.9 version is tagged "jf 12/2/2003" but there is no indication what
it is trying to address (if anything).

Cheers,
    - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Alexander Lazarevic'
In reply to this post by Andreas.Raab
Andreas Raab schrieb:

> Georg Köster wrote:
>> This shows the use case. Shorter even:
>> |s|
>> s := Semaphore new.
>> Delay timeoutSemaphore: s afterMSecs: 1.
>> s wait.
>> Transcript show: 'doesnt get here - not printed'; cr.
>>
>> I hope I'm doing something wrong here!
>
> You must be. This works fine in a Croquet SDK image.
Unfortunately it seems that a ChangeSet of mine addressing [1]
introduced this bug. I guess it got included in 3.9 without anyone else
looking at it. As I mention in [2] the Delay class side method should read:

timeoutSemaphore: aSemaphore afterMSecs: anInteger
   "...comment..."

    ^ (self new setDelay: anInteger forSemaphore: aSemaphore) schedule

I guess trying to refactor too much made me change the method. I'm sorry
that I did not double check it more carefully in the first place.

Alex

[1] http://bugs.squeak.org/view.php?id=854
[2] http://bugs.squeak.org/view.php?id=6834

Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Georg Köster
Hi Alex,

thanks for the note - but in my case I'm on the Croquet image and on the 3.8.18beta1U and my code looks like it should!

Georg

On Jan 5, 2008 1:47 PM, Alexander Lazarević < [hidden email]> wrote:
Andreas Raab schrieb:
> Georg Köster wrote:
>> This shows the use case. Shorter even:
>> |s|
>> s := Semaphore new.
>> Delay timeoutSemaphore: s afterMSecs: 1.
>> s wait.
>> Transcript show: 'doesnt get here - not printed'; cr.
>>
>> I hope I'm doing something wrong here!
>
> You must be. This works fine in a Croquet SDK image.
Unfortunately it seems that a ChangeSet of mine addressing [1]
introduced this bug. I guess it got included in 3.9 without anyone else
looking at it. As I mention in [2] the Delay class side method should read:

timeoutSemaphore: aSemaphore afterMSecs: anInteger
  "...comment..."

   ^ (self new setDelay: anInteger forSemaphore: aSemaphore) schedule

I guess trying to refactor too much made me change the method. I'm sorry
that I did not double check it more carefully in the first place.

Alex

[1] http://bugs.squeak.org/view.php?id=854
[2] http://bugs.squeak.org/view.php?id=6834




Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Andreas.Raab
Georg Köster wrote:
> Hi Alex,
>
> thanks for the note - but in my case I'm on the Croquet image and on the
> 3.8.18beta1U and my code looks like it should!

So then what exactly is your problem? Like I was saying earlier, the
code you posted appears to work fine in the Croquet SDK image.

Cheers,
   - Andreas

>
> Georg
>
> On Jan 5, 2008 1:47 PM, Alexander Lazarević < [hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Andreas Raab schrieb:
>      > Georg Köster wrote:
>      >> This shows the use case. Shorter even:
>      >> |s|
>      >> s := Semaphore new.
>      >> Delay timeoutSemaphore: s afterMSecs: 1.
>      >> s wait.
>      >> Transcript show: 'doesnt get here - not printed'; cr.
>      >>
>      >> I hope I'm doing something wrong here!
>      >
>      > You must be. This works fine in a Croquet SDK image.
>     Unfortunately it seems that a ChangeSet of mine addressing [1]
>     introduced this bug. I guess it got included in 3.9 without anyone else
>     looking at it. As I mention in [2] the Delay class side method
>     should read:
>
>     timeoutSemaphore: aSemaphore afterMSecs: anInteger
>       "...comment..."
>
>        ^ (self new setDelay: anInteger forSemaphore: aSemaphore) schedule
>
>     I guess trying to refactor too much made me change the method. I'm sorry
>     that I did not double check it more carefully in the first place.
>
>     Alex
>
>     [1] http://bugs.squeak.org/view.php?id=854
>     [2] http://bugs.squeak.org/view.php?id=6834
>     <http://bugs.squeak.org/view.php?id=6834>
>
>
>
> ------------------------------------------------------------------------
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Georg Köster
In reply to this post by Andreas.Raab
Hey I prefer Tom's version! I shouldn't be racy if I read the resumeProcess and resume code correctly. The resumeProcess message has no effect on non-waiting processes!

>               | waitingProcess wakeupProcess |
>               waitingProcess _ Processor activeProcess.
>               wakeupProcess _
>                       [(Delay forMilliseconds: (anInteger max: 0)) wait.
>                       self resumeProcess: waitingProcess] fork.
>
>               self wait.
    "preempting here and getting the resumeProcess message sent would have no effect - therefore no race!"
    "in comparison having a semaphore getting signaled here would cause an excess signal on the sem: bad"
>               wakeupProcess terminate.

Cheers!
Georg

On Jan 5, 2008 9:43 AM, Andreas Raab <[hidden email]> wrote:
Tom Phoenix wrote:

> Is that the source in your image for that method? Here's what I have:
>
>       waitTimeoutMSecs: anInteger
>               "Wait on this semaphore for up to the given number of milliseconds,
> then timeout. It is up to the sender to determine the difference
> between the expected event and a timeout."
>
>               | waitingProcess wakeupProcess |
>               waitingProcess _ Processor activeProcess.
>               wakeupProcess _
>                       [(Delay forMilliseconds: (anInteger max: 0)) wait.
>                       self resumeProcess: waitingProcess] fork.
>
>               self wait.
>               wakeupProcess terminate.

How odd. It seems that this version is only present in 3.9 - all other
images have a variant of the (much simpler) code:

waitTimeoutMSecs: anInteger
    "Wait on this semaphore for up to the given number of milliseconds,
then timeout. It is up to the sender to determine the difference between
the expected event and a timeout."
    | d |
    d := Delay timeoutSemaphore: self afterMSecs: (anInteger max: 0).
    [self wait] ensure:[d unschedule].

The 3.9 version is tagged "jf 12/2/2003" but there is no indication what
it is trying to address (if anything).

Cheers,
   - Andreas




Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Andreas.Raab
Georg Köster wrote:
> Hey I prefer Tom's version! I shouldn't be racy if I read the
> resumeProcess and resume code correctly. The resumeProcess message has
> no effect on non-waiting processes!

To the contrary. This code *introduces* a race condition when
manipulating the Semaphore's list of processes without protecting it
against concurrent modifications. Worse, the code *cannot be protected*
against concurrent modification since the VM manipulates that list on
its own. It may actually explain why some people have reported issues on
3.9 that do not seem to appear on 3.8 or 3.10 variants.

>
>  >               | waitingProcess wakeupProcess |
>  >               waitingProcess _ Processor activeProcess.
>  >               wakeupProcess _
>  >                       [(Delay forMilliseconds: (anInteger max: 0)) wait.
>  >                       self resumeProcess: waitingProcess] fork.
>  >
>  >               self wait.
>     "preempting here and getting the resumeProcess message sent would
> have no effect - therefore no race!"

What are you talking about? With the original code there was no race
condition whatsoever. If you think there is a race condition somewhere,
please explain in detail where you think that race condition is.

>     "in comparison having a semaphore getting signaled here would cause
> an excess signal on the sem: bad"

First, the original code was guarded with a call to #unschedule which
would avoid a second signal if the "real event" occurs long enough
before the "timeout event". On the other hand, neither implementation
prevents a double-signal if the real event occurs within an epsilon of
the timeout event. And due to the asynchronous nature of some semaphore
signals it is impossible to have a light-weight implementation like the
one provided in Semaphore handle this correctly.  It's possible to
handle this but this would require to separate the semaphore and the
delay signal and use a critical section to decide which one came first
and how to deal with the other. That, on the other hand, seems overkill
for the 99% of the cases in which the simplistic implementation in
3.8/3.10 is all that's needed.

Cheers,
   - Andreas




Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Georg Köster
Hi Andreas, all,

first of all: Sorry, it also works in my fresh Croquet image. The backup I always used is apparently not as fresh as I thought :-( mince. And I didn't see that email :-( Double sorry.

Newbie qestion: Why are images restartable? Why not label them differently (for example as 'core dump') if they don't work?

To the race discussion. I think we have a point:
 § double signaling is unavoidable if a critical section is to be avoided

Therefore there is a race in the first implementation.

The second implementation has clearly other troubles (I didn't see that it's just using a LinkedList that's apparently not thread safe), but I explained before that I believe at least the double signaling is dealt with.

I would recommend at least adding this problem to the documentation of wait*, that if one of the timed waits is used the wait (without timeout) contract is violated in that it might return even if no user process sent a signal.

Best regards and thanks for considering my comments in the first place!
Georg

On Jan 5, 2008 3:05 PM, Andreas Raab <[hidden email] > wrote:
Georg Köster wrote:
> Hey I prefer Tom's version! I shouldn't be racy if I read the
> resumeProcess and resume code correctly. The resumeProcess message has
> no effect on non-waiting processes!

To the contrary. This code *introduces* a race condition when
manipulating the Semaphore's list of processes without protecting it
against concurrent modifications. Worse, the code *cannot be protected*
against concurrent modification since the VM manipulates that list on
its own. It may actually explain why some people have reported issues on
3.9 that do not seem to appear on 3.8 or 3.10 variants.

>
>  >               | waitingProcess wakeupProcess |
>  >               waitingProcess _ Processor activeProcess.
>  >               wakeupProcess _
>  >                       [(Delay forMilliseconds: (anInteger max: 0)) wait.
>  >                       self resumeProcess: waitingProcess] fork.
>  >
>  >               self wait.
>     "preempting here and getting the resumeProcess message sent would
> have no effect - therefore no race!"

What are you talking about? With the original code there was no race
condition whatsoever. If you think there is a race condition somewhere,
please explain in detail where you think that race condition is.

>     "in comparison having a semaphore getting signaled here would cause
> an excess signal on the sem: bad"

First, the original code was guarded with a call to #unschedule which
would avoid a second signal if the "real event" occurs long enough
before the "timeout event". On the other hand, neither implementation
prevents a double-signal if the real event occurs within an epsilon of
the timeout event. And due to the asynchronous nature of some semaphore
signals it is impossible to have a light-weight implementation like the
one provided in Semaphore handle this correctly.  It's possible to
handle this but this would require to separate the semaphore and the
delay signal and use a critical section to decide which one came first
and how to deal with the other. That, on the other hand, seems overkill
for the 99% of the cases in which the simplistic implementation in
3.8/3.10 is all that's needed.

Cheers,
  - Andreas







Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Andreas.Raab
Georg Köster wrote:
> The second implementation has clearly other troubles (I didn't see that
> it's just using a LinkedList that's apparently not thread safe), but I
> explained before that I believe at least the double signaling is dealt
> with.

It most definitely does not prevent double signaling! What makes you
think it does? It is easiest to see in the case where we first get a
timeout signal, and then -while being inside #resumeProcess:- the "real"
event. Obviously there is nothing in the implementation that prevents
the second signal from occurring.

> I would recommend at least adding this problem to the documentation of
> wait*, that if one of the timed waits is used the wait (without timeout)
> contract is violated in that it might return even if no user process
> sent a signal.

Well, duh. It is the very *definition* of waitTimeoutMSecs: that it may
return although the event hasn't occurred. That's what the method does.
The comment is explicit, too noting that  "It is up to the sender to
determine the difference between the expected event and a timeout".

Cheers,
   - Andreas

Reply | Threaded
Open this post in threaded view
|

Re: Simple lock up with delay + semaphore - not fixed with 0006576

Georg Köster
Hi Andreas,

ok, I didn't explain it thorough enough. That's why meeting can be nice sometimes!

I'm keeping my two statements up.

To the first: The doubled signal and its resume would occur, but it would resume a process that is already resumed, and therefore have no effect (I assume that the process doesn't skip a wait because of a resume received while running). The process can only be waiting again and therefore vulnerable to a doubled signal if he passes the termination of the interrupting thread - and past this termination no signaling from the dead process can occur. So yes, double signaling may occur - but it's dealt with.

To the second: Yes, definition of waitTimeoutMSecs:, but not wait:! That's what I meant when I said 'if one of the timed waits [waitTimeoutMSecs or -Seconds] is used the wait (without timeout) contract is violated...'
It is very normal that the caller needs to check the result after using a timed wait, but nothing in the doc states that also the processes calling a forever-blocking wait will now need to check the result. That should be added. Something along the lines of:

Semaphore>>wait
"...
The calling process is required to check if it was signaled by some external means as the signaling might occur bogusly."
...

Isn't this the consequence of the possible double signaling?

Cheers,
Georg

On Jan 5, 2008 4:28 PM, Andreas Raab <[hidden email]> wrote:
Georg Köster wrote:
> The second implementation has clearly other troubles (I didn't see that
> it's just using a LinkedList that's apparently not thread safe), but I
> explained before that I believe at least the double signaling is dealt
> with.

It most definitely does not prevent double signaling! What makes you
think it does? It is easiest to see in the case where we first get a
timeout signal, and then -while being inside #resumeProcess:- the "real"
event. Obviously there is nothing in the implementation that prevents
the second signal from occurring.

> I would recommend at least adding this problem to the documentation of
> wait*, that if one of the timed waits is used the wait (without timeout)
> contract is violated in that it might return even if no user process
> sent a signal.

Well, duh. It is the very *definition* of waitTimeoutMSecs: that it may
return although the event hasn't occurred. That's what the method does.
The comment is explicit, too noting that  "It is up to the sender to
determine the difference between the expected event and a timeout".

Cheers,
  - Andreas




Reply | Threaded
Open this post in threaded view
|

Real profiling: Why is my UI so slow?

Tapple Gao
In reply to this post by Georg Köster
Hi all. While playing around with my DeltaStreamBrowser, I see
that it takes four seconds for the UI to update after I click on
a change. And I have no idea why.

I've often wondered why the browser is so slow too, and now I
have the need to find out. So, How does one go about debugging
this?

I know about MessageTally and TimeProfileBrowser, but I've only
seen them used on well-defined code, not in the wild in the
Squeak UI. And I know about world > debug > Message Tally, but I
really have no idea what the numbers they report mean, or what
to look for in the reports they return, or how to even know what
messages were sent while actually processing my click vs. in
between while I move the mouse around.

What I would really like to happen is, I tell squeak: "Pop-up a
debugger on the next click". Then, on the next click, the UI
process is suspended and shown in a debugger that has it's own
private UI loop. Is anything like this possible? I am sure the
Morphic event loop was not created without the help of the
debugger, but I really have no idea how to trace a UI event from
post to screen-update.

So, that is what I have to work with, and how I don't know what
to do with it. So, how do I track down the latency problem?

--
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808

Reply | Threaded
Open this post in threaded view
|

Re: Real profiling: Why is my UI so slow?

Tom Phoenix
On Jan 25, 2008 10:25 PM, Matthew Fulmer <[hidden email]> wrote:

> I've often wondered why the browser is so slow too, and now I
> have the need to find out. So, How does one go about debugging
> this?

Have you tried interrupting it? While you're waiting for Squeak to
respond, use the Cmd-period (Alt-period) interrupt to bring up a
debugger (and a new main event loop, if needed). From there, you may
still have a slow UI, but it should be fairly straightforward to see
what method is taking a long time to respond. (My money is on #hash,
but you never know.)

Good luck with it!

--Tom Phoenix

Reply | Threaded
Open this post in threaded view
|

Re: Real profiling: Why is my UI so slow?

Jerome Peace
In reply to this post by Tapple Gao
Real profiling: Why is my UI so slow?

Hi Matthew,

I have seen some interesting problems with the image
and browser speed.
The slowness seems to come when the browser is
updating itself and that can be triggered by other
morphs overlapping (e.g. other browsers).  

See the mantis report:

http://bugs.squeak.org/view.php?id=2084
0002084: Each overlaping browser slows down screen
updates considerably. Caching the image would speed
things up.

This is mostly external observations. At the time I
reported this I hadn't learned to use the tally tools
usefully. (I still haven't. They give too much
information. I want something that will point me
quickly to the culprit.)

My observation is that caching images would be one way
to speed things up. I haven't tried to tackle the
problem directly. (Except to make 3.9 workspaces start
out smaller. The less things overlap the smaller the
problem.)

Animated morphs are a good way to aggrivate the
problem. By stepping and invalidating their bounds.

Hmm, I wonder if a one pixel rectangle would be good
for pinpointing the worst update areas?.Invalidate the
rectangle each step. Move it around the screen to see
which pixel causes the greatest slowdown. The step
code could note the clock interval between steps.
Color a map of the screen with the results.

Alright, we have a crazy idea. Lets see where it gets
us. I'll post something to mantis when it gets
interesting.

Cheers.

Yours in curiosity and service, --Jerome Peace

***
>Matthew Fulmer tapplek at gmail.com
>Sat Jan 26 06:25:35 UTC 2008
>
>
>Hi all. While playing around with my
DeltaStreamBrowser, I see
>that it takes four seconds for the UI to update after
I click on
>a change. And I have no idea why.
>
>I've often wondered why the browser is so slow too,
and now I
>have the need to find out. So, How does one go about
debugging
>this?
>
>I know about MessageTally and TimeProfileBrowser, but
I've only
>seen them used on well-defined code, not in the wild
in the
>Squeak UI. And I know about world > debug > Message
Tally, but I
>really have no idea what the numbers they report
mean, or what
>to look for in the reports they return, or how to
even know what
>messages were sent while actually processing my click
vs. in
>between while I move the mouse around.
>
>What I would really like to happen is, I tell squeak:
"Pop-up a
>debugger on the next click". Then, on the next click,
the UI
>process is suspended and shown in a debugger that has
it's own
>private UI loop. Is anything like this possible? I am
sure the
>Morphic event loop was not created without the help
of the
>debugger, but I really have no idea how to trace a UI
event from
>post to screen-update.
>
>So, that is what I have to work with, and how I don't
know what
>to do with it. So, how do I track down the latency
problem?
***




      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Reply | Threaded
Open this post in threaded view
|

Re: Real profiling: Why is my UI so slow?

timrowledge

On 26-Jan-08, at 10:07 PM, Jerome Peace wrote:
>
>
> http://bugs.squeak.org/view.php?id=2084
> 0002084: Each overlaping browser slows down screen
> updates considerably. Caching the image would speed
> things up.
It appears to be not just 'considerably' but something along the lines  
of O(N^2) or worse. Run a loop that opens browsers and watch as the  
first few are really fast, then a few slow then

it

gets

really




sl


o


w

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Strange OpCodes: RCM: Randomly Corrupt Microcode



Reply | Threaded
Open this post in threaded view
|

Re: Real profiling: Why is my UI so slow?

Igor Stasenko
On 27/01/2008, tim Rowledge <[hidden email]> wrote:

>
> On 26-Jan-08, at 10:07 PM, Jerome Peace wrote:
> >
> >
> > http://bugs.squeak.org/view.php?id=2084
> > 0002084: Each overlaping browser slows down screen
> > updates considerably. Caching the image would speed
> > things up.
> It appears to be not just 'considerably' but something along the lines
> of O(N^2) or worse. Run a loop that opens browsers and watch as the
> first few are really fast, then a few slow then
>
> it
>
> gets
>
> really
>
>
>
>
> sl
>
>
> o
>
>
> w
>

No, it's O(N) , but when you having 10 browser windows on screen, full
world update could take about 2-3 seconds.

I'm using this code for testing draw speed:

MessageTally spyOn: [ World fullDrawOn: World assuredCanvas ]

In my image, most time is spent in string drawing primitive.

--
Best regards,
Igor Stasenko AKA sig.

Reply | Threaded
Open this post in threaded view
|

Re: Real profiling: Why is my UI so slow?

Tapple Gao
In reply to this post by Jerome Peace
On Sat, Jan 26, 2008 at 10:07:03PM -0800, Jerome Peace wrote:
> This is mostly external observations. At the time I
> reported this I hadn't learned to use the tally tools
> usefully. (I still haven't. They give too much
> information. I want something that will point me
> quickly to the culprit.)

I talked to Bryce about this on #squeak, and he gave several
suggestions:

- In MessageTally reports, look for big drops in time usage from
  a parent to child method. That usually points toward the
  culprit
- Insert logging statements in code. In Keith's Logging
  framework (http://wiki.squeak.org/squeak/3706), I added a
  method just to make logging profiles easy. It helps to see
  when a method is getting called way too often.

I still don't see how to use the debugger effectively to profile
UI's, since events will be posted from the debugger thread, but
screen updates happen in the UI thread, which is impossible to
get a useful debugger on (as far as I can tell)

--
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808

12