some observations
(a) In the past we would call checkForInterrupts. One of the interesting things with that routine was that it used interruptCheckCounter to modulate how often it was called. Thus in the bytecode longUnconditionalJump /* begin internalQuickCheckForInterrupts */ if ((foo->interruptCheckCounter -= 1) <= 0) We tried only to call checkForInterrupts every 1 millisecond or so by incrementing, decrementing interruptCheckCounter to some large number. See my note on the squeak mailing list "VM tuning results and a question or two?" November 16, 1999 11:09:04 PM PST (CA) We would of course set that to a negative number if we wanted a more instant response. However now the HydraCode promptly does a return from interp on each jump backwards at some unknown cost, I'd guess this affects looping to some interesting value? (b) Buried in checkForInterrupts was code to thump TheTimerSemaphore when needed and to watch for millisecond clock rollover. But now it seems the platform implementer is now require to build a pthread based routine to mange the next wakeup time, and or do something clever elsewhere to watch when the millisecond clock rolls over and/or exceeds the next expected wakeup time. However that implies each implementation will have the same behaviour, where as before it was magically handled by the VM and not requiring lots of independently written operating system timer magic. As of now I don't have a workable vm since I need to implement this timer thread logic and one hopes that will solve the where we lockup the carbon event handler as something fails to process the draw window request at window activation properly. -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
2008/5/3 John M McIntosh <[hidden email]>:
> some observations > (a) In the past we would call checkForInterrupts. One of the interesting > things with that routine was that it used interruptCheckCounter > to modulate how often it was called. Thus in the bytecode > longUnconditionalJump > /* begin internalQuickCheckForInterrupts */ > if ((foo->interruptCheckCounter -= 1) <= 0) > > We tried only to call checkForInterrupts every 1 millisecond or so by > incrementing, decrementing interruptCheckCounter to some large number. > > See my note on the squeak mailing list > "VM tuning results and a question or two?" November 16, 1999 11:09:04 PM > PST (CA) > > We would of course set that to a negative number if we wanted a more > instant response. > > However now the HydraCode promptly does a return from interp on each jump > backwards at some unknown cost, I'd guess this affects looping to some > interesting value? > No, it changes the way how interrupts are handled. Instead of optimistic way (check each 1 msec, and hope there is something to handle) , now it's more deterministic: interrupt if and only if there are events to handle. A cost of returning from interpret() function and then calling it back is too small to take in account. And the frequency , how often interpret() interrupts is now depends only from how often you generating events. But cost to generate event and putting it to queue is much more than simply return and call function again. So even if you flood VM with events , a bottleneck will be not in interpret() , but in your event generation code :) In fact i observed a little speedup, when finished refactoring to new model (about 0.8%) on my PC. Also, making intepret() reentrant leads to some other benefits, not yet exploited, but i'm sure you can see it yourself what perspectives it opens. One of the obvious perspective, that i can replace a loop from while (1) to while (!terminated), which will allow me to handle killing interpreter instance in graceful way, without quitting VM. Comparing to old VM, the only way how you can terminate interpreter, is to quit OS process, what its actually doing. But in HydraVM, a quit primitive should be changed to stop interpreter instance, while continue working with another instances. >From a comment from mail you pointed to: > I think the big problem is the timer accuracy which currently depends on the > frequency of #checkForInterrupts. If you increase the #interruptCheckCounter > your timer accuracy will decrease. Though, of course, one could adjust the > interruptCheckCounter based on whether or not a timer is active (or even how > far in the future the next timer tick is). - now, in Hydra there is no such dependency. > Good question. For user interrupts, once or twice per second is possibly > sufficient. For the timer it should be as accurate as possible. For external > semaphores or finalization it should be as soon as these are signaled. - all semaphores signaled by generating events, and as i already stated, any event pushed to Interpreter event queue causes interpreter loop return as soon as possible. > > (b) Buried in checkForInterrupts was code to thump TheTimerSemaphore when > needed and to watch for millisecond clock rollover. > > But now it seems the platform implementer is now require to build a pthread > based routine to mange the next wakeup time, and or do something clever > elsewhere to > watch when the millisecond clock rolls over and/or exceeds the next > expected wakeup time. > > However that implies each implementation will have the same behaviour, > where as before it was magically handled by the VM and not requiring lots of > independently written operating system timer magic. > > > As of now I don't have a workable vm since I need to implement this timer > thread logic and one hopes that will solve the where we lockup the carbon > event handler as something fails > to process the draw window request at window activation properly. > Consider, that my implementation was based on observations of what windows VM doing. And its already used a multimedia timer, which was set to trigger checking for interrupts each 1 msec (or at minimum time periods which OS can provide, but not less than 1mses). And these routines created own timer thread, hidden from the eyes of developer. So, what i did, i just replaced this thread by own implementation. Also, because of using multimedia timer, i seen an ovelap: an optimistic ( interruptCheckCounter handling) was doing the same as multimedia timer does, which, IMHO, can't be considered as a good algorithm. After refactoring, i got a code which can handle timer with better accuracy comparing to old VM. (Read a topic some time back about Delays accuracy). In fact, i was surprised, when seen, that my implementation provides more accurate timers, i expected it to be worser :) > > -- > =========================================================================== > John M. McIntosh <[hidden email]> > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > =========================================================================== > > -- Best regards, Igor Stasenko AKA sig. |
> Consider, that my implementation was based on observations of what
> windows VM doing. And its already used a multimedia timer, which was > set to trigger checking for interrupts each 1 msec (or at minimum time > periods which OS can provide, but not less than 1mses). And these > routines created own timer thread, hidden from the eyes of developer. > So, what i did, i just replaced this thread by own implementation. > Also, because of using multimedia timer, i seen an ovelap: an > optimistic ( interruptCheckCounter handling) was doing the same as > multimedia timer does, which, IMHO, can't be considered as a good > algorithm. > > After refactoring, i got a code which can handle timer with better > accuracy comparing to old VM. (Read a topic some time back about > Delays accuracy). > In fact, i was surprised, when seen, that my implementation provides > more accurate timers, i expected it to be worser :) Ok, I'll have to run some benchmarks, when I changed the macintosh VM to pound the interrupt delay logic 1000 a second back in the era of 500 Mhz machines the impact on performance was noticeable. Maybe today no one cares, maybe the folks chasing the why does opening windows take 2x as long can fix that, then mmm we'll consume part of the gain back in overhead to improve Delay accuracy. Maybe I can clock watch before handleEvents() to avoid the overhead of a timer routine. I note we use to have clock watching on each primitive call in ages past, but removed it since we found that under certain conditions one could spend % of time just getting the clock, some vestiges of that lurk via the non-existant lowres millisecond clock function. Maybe it's not noticeable now. Ya, benchmarking first... -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
Oops, didn't noticed that message sent only to John.
---------- Forwarded message ---------- From: Igor Stasenko <[hidden email]> Date: 2008/5/3 Subject: Re: [squeak-dev] hydra vm update To: [hidden email] 2008/5/3 John M McIntosh <[hidden email]>: > > > Consider, that my implementation was based on observations of what > > windows VM doing. And its already used a multimedia timer, which was > > set to trigger checking for interrupts each 1 msec (or at minimum time > > periods which OS can provide, but not less than 1mses). And these > > routines created own timer thread, hidden from the eyes of developer. > > So, what i did, i just replaced this thread by own implementation. > > Also, because of using multimedia timer, i seen an ovelap: an > > optimistic ( interruptCheckCounter handling) was doing the same as > > multimedia timer does, which, IMHO, can't be considered as a good > > algorithm. > > > > After refactoring, i got a code which can handle timer with better > > accuracy comparing to old VM. (Read a topic some time back about > > Delays accuracy). > > In fact, i was surprised, when seen, that my implementation provides > > more accurate timers, i expected it to be worser :) > > > > Ok, I'll have to run some benchmarks, when I changed the macintosh VM to > pound the interrupt delay > logic 1000 a second back in the era of 500 Mhz machines the impact on > performance was noticeable. > Maybe today no one cares, maybe the folks chasing the why does opening > windows take 2x as long > can fix that, then mmm we'll consume part of the gain back in overhead to > improve Delay accuracy. > My benchmarks showing opposite: the tinyBenchmarks runs faster with new model than with old checkForInterrupts. And Delay accuracy is improved. Maybe this can be not true for other platforms, but for Windows, i got gain in both areas, at a cost of implementing own timer thread routine. :) > Maybe I can clock watch before handleEvents() to avoid the overhead of a > timer routine. > > I note we use to have clock watching on each primitive call in ages past, > but removed it since we > found that under certain conditions one could spend % of time just getting > the clock, some vestiges > of that lurk via the non-existant lowres millisecond clock function. Maybe > it's not noticeable now. > > Ya, benchmarking first... > Also note, that HydraVM is targeted for multicore CPUs. In case if timer thread resides on different core than interpreter thread, there will be no scheduler overhead to switch active thread, and this makes delays handling even more accurate. Just run test again on quad-core box with same image: |delay bag| delay := Delay forMilliseconds: 1. bag := Bag new. 1000 timesRepeat:[bag add: [delay wait] timeToRun]. bag sortedCounts HydraVM: a SortedCollection(932->2 67->1 1->3) a SortedCollection(932->2 68->1) a SortedCollection(932->2 68->1) Croquet VM: a SortedCollection(952->2 48->1) a SortedCollection(951->2 48->1 1->4) a SortedCollection(951->2 46->1 3->3) This can be interpreted to one of following: - Hydra delays is more accurate - maybe its having similar accuracy (or even lower), but spends less time to signal semaphore, so we got noticeable increase in numbers with 1 msec results. No wonder, benchmarks on vanilla VM are faster : Croquet VM 1 tinyBenchmarks '485768500 bytecodes/sec; 14611978 sends/sec' '486229819 bytecodes/sec; 14520007 sends/sec' '485768500 bytecodes/sec; 14554360 sends/sec' [ 1 tinyBenchmarks ] timeToRun 5337 5336 5402 HydraVM: 1 tinyBenchmarks '454706927 bytecodes/sec; 13731345 sends/sec' '455516014 bytecodes/sec; 13363453 sends/sec' '456327985 bytecodes/sec; 13210400 sends/sec' '453900709 bytecodes/sec; 13373136 sends/sec' [ 1 tinyBenchmarks ] timeToRun 5692 5770 5796 Not sure, if using timeToRun is fair here, because it using timeToRun in code to determine one of the parameters. This actually shows an overhead of introducing interpreter as argument to each function. However, it would be interesting to add as option to build VM using thread-local storage to minimize impact of introducing multiple interpreter instances. As a bonus we'll have full compatibility with old primitives, because we don't need to pass interpreter instance as argument. But it was a design choice, and we decided to pass interpreter as extra argument instead of using thread-local storage. -- Best regards, Igor Stasenko AKA sig. |
Free forum by Nabble | Edit this page |