The recent thread on Ctrl-Break inspired me to try out a few things. I
thought I had something, using a high-priority process that could be used to interrupt the main user interface process. But, what worked in D5 didn't in D6. I did some more experimenting and came to the rather disturbing conclusion that tight computations in D6 can also keep higher priority Smalltalk processes from running. This can be demonstrated with: --- go := true. log := OrderedCollection new. [ [Processor sleep: 1000. log add: Time now -> 'interrupt awake'. go] whileTrue. ] forkAt: Processor userInterruptPriority. log add: Time now -> 'start loop'. Time millisecondsToRun: [20000000 timesRepeat: [2.0+17/(3*2+42)]]. log add: Time now -> 'loop done'. go := false. log inspect. --- Adjust the constant (starting small) to have the computation take a reasonable amount of time, say around 10 seconds. On D6 the higher priority process only runs after the tight computation is done. On D5 it wakes up once each second (as it should). You can change the computation in various ways that _will_ allow the process context switching to occur. For example simply using "2 + 2.0" will work, with its primitive failure and coercing of the SmallInteger into a Float. It's certainly not clear to me just what VM operations need to be occurring during a computation so that process context switching will happen. Can anyone shed some light? Would be good to guage how likely it is that this situation might arise in an actual application. I'm also curious if this is related to the issue of getting ctrl-break to respond. A related observation: the times for the computation using D5 were #(7334 7286 7211) while in D6 they were #(9439 9276 9301), or 28% slower! Granted, this is just one data point and there may be good reasons for the change, but still, I don't like the direction. -- Bill Dargel [hidden email] Shoshana Technologies 100 West Joy Road, Ann Arbor, MI 48105 USA |
Bill,
> I did some more experimenting and came to the rather disturbing > conclusion that tight computations in D6 can also keep higher priority > Smalltalk processes from running. I see the same effect, including D6 being slower on this example than D5, on WinXP. Interresting... > I'm also > curious if this is related to the issue of getting ctrl-break to respond. I think they must be connected -- the un-preemtable loop is not interruptible by ctrl-break, but changing the calculation to make the it pre-emptable also makes it interruptible. Presumably the VM checks for ctrl+break at the same time as its re-scheduling code runs. -- chris |
In reply to this post by Bill Dargel
"Bill Dargel" <[hidden email]> wrote in message
news:[hidden email]... > The recent thread on Ctrl-Break inspired me to try out a few things. I > thought I had something, using a high-priority process that could be used > to interrupt the main user interface process. > > But, what worked in D5 didn't in D6. > > I did some more experimenting and came to the rather disturbing conclusion > that tight computations in D6 can also keep higher priority Smalltalk > processes from running. Hi Bill, thanks for the report. Sorry for the delayed response. This issue is due mainly to a compiler change that has made the bytecodes it generates for some loops get out of sync with the VM's assumptions. In D5 the compiler used to generate rather inefficient code for some loops (such as #timesRepeat:) which contained a conditional jump at the start of the loop, and an unconditional jump at the end. Both jumps were executed for every loop iteration. In D6 the compiler uses the more efficient implementation of jumping unconditionally to the end of the loop when the loop starts, and having only a single conditional jump back on loop iterations. Unfortunately the VM still assumes the old behaviour, and only tests for asynchronous events (such as timer interrupts and Ctrl+Break) on certain unconditional backwards jumps. This means that tight loops which do not contain any actual message sends will be uninterruptable. It also means that tight loops which contain relatively few message sends may take a very long time respond to Ctrl+Break, since this is expensive to check for and so is sampled only after executing a relatively large number of message sends/jumps. The actual sampling interval will depend on the speed of your machine, but will probably be around 2^20 on modern machines. The calculation of the sampling interval was based on a trade off between overhead and responsiveness to asynchronous events that was designed back in the early days of Dolphin. It may no longer be appropriate. > ... > Adjust the constant (starting small) to have the computation take a > reasonable amount of time, say around 10 seconds. > > On D6 the higher priority process only runs after the tight computation is > done. On D5 it wakes up once each second (as it should). You can change > the computation in various ways that _will_ allow the process context > switching to occur. For example simply using "2 + 2.0" will work, with its > primitive failure and coercing of the SmallInteger into a Float. This is because a number of true message sends are then needed to perform the computation. You should also find that Ctrl+Break will now work almost immediately too. > > It's certainly not clear to me just what VM operations need to be > occurring during a computation so that process context switching will > happen. Can anyone shed some light? Just ensure some actual (unoptimised) messages are sent. You can identify optimised messages sends as they are labelled "special" in the disassembly. >...Would be good to guage how likely it is that this situation might arise >in an actual application. Unlikely I would say, unless your application contains tight loops that do nothing but arithmetic. >...I'm also curious if this is related to the issue of getting ctrl-break >to respond. > Yes it is. > A related observation: the times for the computation using D5 were #(7334 > 7286 7211) while in D6 they were #(9439 9276 9301), or 28% slower! > Granted, this is just one data point and there may be good reasons for the > change, but still, I don't like the direction. The speed of this loop is dominated entirely by the floating point arithmetic, which in turn is dominated by allocation time. Allocation of floats in a tight loop like this is slower in D6. Indeed you may find there are some micro benchmarks on which D5 is faster. I'm not really concerned about most of these, however, since these are mostly insignificant in relation to overall application performance. On more macro benchmarks D6 should be faster, sometimes substantially so. Where it's not, I'm interested to hear about it, but remember that overall performance is a balance and sometimes trade offs are necessary. Regards Blair |
Blair wrote:
> Hi Bill, thanks for the report. Thanks for the explanation! > ... The calculation of > the sampling interval was based on a trade off between overhead and > responsiveness to asynchronous events that was designed back in the > early days of Dolphin. It may no longer be appropriate. After reading this description of counting message sends and unconditional backwards jumps as a way to estimate elapsed time, my deja-vu o'meter started to ring. Here's a link to the end of a discussion we had three years ago entitled "External calls and UI responsiveness": http://groups.google.com/group/comp.lang.smalltalk.dolphin/msg/1f6c18c0dc194fac Back then, the cause of low message/jump counts was external calls, instead of tight loops (with the erroneous VM assumption). And what was being starved was checking of the windows message queue, instead of checking for ctrl-break or the Delay timer event. But the underlying problem sure seems the same -- using that message/jump counter as an approximator of elapsed time. IMNAWP (I am not a Windows programmer), but wouldn't it be possible to have a windows thread wait on a periodic timer and set or increment a memory location that the main VM process could then test, just like it tests the message/jump count? Seems like the overhead for the main VM process would be exactly the same as it is now. Perhaps even less, if it no longer needed to maintain the message count. -- Bill Dargel [hidden email] Shoshana Technologies 100 West Joy Road, Ann Arbor, MI 48105 USA |
Bill Dargel wrote:
> Blair wrote: > >> Hi Bill, thanks for the report. > > > Thanks for the explanation! > >> ... The calculation of the sampling interval was based on a trade off >> between overhead and responsiveness to asynchronous events that was >> designed back in the early days of Dolphin. It may no longer be >> appropriate. > > > After reading this description of counting message sends and > unconditional backwards jumps as a way to estimate elapsed time, my > deja-vu o'meter started to ring. Here's a link to the end of a > discussion we had three years ago entitled "External calls and UI > responsiveness": > http://groups.google.com/group/comp.lang.smalltalk.dolphin/msg/1f6c18c0dc194fac > > > Back then, the cause of low message/jump counts was external calls, > instead of tight loops (with the erroneous VM assumption). And what was > being starved was checking of the windows message queue, instead of > checking for ctrl-break or the Delay timer event. > > But the underlying problem sure seems the same -- using that > message/jump counter as an approximator of elapsed time. > > IMNAWP (I am not a Windows programmer), but wouldn't it be possible to > have a windows thread wait on a periodic timer and set or increment a > memory location that the main VM process could then test, just like it > tests the message/jump count? Seems like the overhead for the main VM > process would be exactly the same as it is now. Perhaps even less, if it > no longer needed to maintain the message count. Yes, it is possible. This is what I did in VW. Up through VW 3.0 we had a counter decremented at the start of any method that built a frame (i.e. no count for non-failing primitives or simple inst var accessors). The system would poll every 1024 counts (I think). First this produces a ridiculously high event poll frequency on fast machines running normal Smalltalk code. Second it wouldn't poll nearly often enough if certain long-running primitives (i.e. large integer arithmetic) are in use. I remember in ~ 1997 having a test case that took 30 seconds from typing control-C for the system to interrupt the running process. This approach was required for Win3.1 support which didn't have true multi-threading. So in ~1998/1999 I threw it out, added a high-priority thread that looped waiting on a Windows semaphore with a 30 ms timeout that caused the VM to poll at a steady 30 milliseconds unless it was asleep. This actually improved base compute performance on WinNT et al by nearly 50% (i.e. nearly doubled performance), mainly by removing an extremely expensive read-modify-write cycle on method activation. -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd |
Free forum by Nabble | Edit this page |