Smalltalk › Squeak › Squeak - Dev

[squeak-dev] hydra vm update

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

4 messages Options

johnmci

[squeak-dev] hydra vm update

some observations
(a) In the past we would call checkForInterrupts. One of the
interesting things with that routine was that it used
interruptCheckCounter
to modulate how often it was called. Thus in the bytecode
longUnconditionalJump
/* begin internalQuickCheckForInterrupts */
if ((foo->interruptCheckCounter -= 1) <= 0)

We tried only to call checkForInterrupts every 1 millisecond or so by
incrementing, decrementing interruptCheckCounter to some large number.

See my note on the squeak mailing list
"VM tuning results and a question or two?" November 16, 1999 11:09:04
PM PST (CA)

We would of course set that to a negative number if we wanted a more
instant response.

However now the HydraCode promptly does a return from interp on each
jump backwards at some unknown cost, I'd guess this affects looping to
some interesting value?

(b) Buried in checkForInterrupts was code to thump TheTimerSemaphore
when needed and to watch for millisecond clock rollover.

But now it seems the platform implementer is now require to build a
pthread based routine to mange the next wakeup time, and or do
something clever elsewhere to
watch when the millisecond clock rolls over and/or exceeds the next
expected wakeup time.

However that implies each implementation will have the same behaviour,
where as before it was magically handled by the VM and not requiring
lots of independently written operating system timer magic.

As of now I don't have a workable vm since I need to implement this
timer thread logic and one hopes that will solve the where we lockup
the carbon event handler as something fails
to process the draw window request at window activation properly.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

Igor Stasenko

Re: [squeak-dev] hydra vm update

2008/5/3 John M McIntosh <[hidden email]>:

> some observations
> (a) In the past we would call checkForInterrupts. One of the interesting
> things with that routine was that it used interruptCheckCounter
> to modulate how often it was called. Thus in the bytecode
> longUnconditionalJump
> /* begin internalQuickCheckForInterrupts */
> if ((foo->interruptCheckCounter -= 1) <= 0)
>
> We tried only to call checkForInterrupts every 1 millisecond or so by
> incrementing, decrementing interruptCheckCounter to some large number.
>
> See my note on the squeak mailing list
> "VM tuning results and a question or two?" November 16, 1999 11:09:04 PM
> PST (CA)
>
> We would of course set that to a negative number if we wanted a more
> instant response.
>
> However now the HydraCode promptly does a return from interp on each jump
> backwards at some unknown cost, I'd guess this affects looping to some
> interesting value?
>

No, it changes the way how interrupts are handled. Instead of
optimistic way (check each 1 msec, and hope there is something to
handle) , now it's more deterministic: interrupt if and only if there
are events to handle.
A cost of returning from interpret() function and then calling it back
is too small to take in account.
And the frequency , how often interpret() interrupts is now depends
only from how often you generating events. But cost to generate event
and putting it to queue is much more than simply return and call
function again. So even if you flood VM with events , a bottleneck
will be not in interpret() , but in your event generation code :)
In fact i observed a little speedup, when finished refactoring to new
model (about 0.8%) on my PC.

Also, making intepret() reentrant leads to some other benefits, not
yet exploited, but i'm sure you can see it yourself what perspectives
it opens. One of the obvious perspective, that i can replace a loop
from
while (1)
to
while (!terminated),
which will allow me to handle killing interpreter instance in graceful
way, without quitting VM. Comparing to old VM, the only way how you
can terminate interpreter, is to quit OS process, what its actually
doing.
But in HydraVM, a quit primitive should be changed to stop interpreter
instance, while continue working with another instances.

>From a comment from mail you pointed to:

> I think the big problem is the timer accuracy which currently depends on the
> frequency of #checkForInterrupts. If you increase the #interruptCheckCounter
> your timer accuracy will decrease. Though, of course, one could adjust the
> interruptCheckCounter based on whether or not a timer is active (or even how
> far in the future the next timer tick is).

- now, in Hydra there is no such dependency.

> Good question. For user interrupts, once or twice per second is possibly
> sufficient. For the timer it should be as accurate as possible. For external
> semaphores or finalization it should be as soon as these are signaled.

- all semaphores signaled by generating events, and as i already
stated, any event pushed to Interpreter event queue causes interpreter
loop return as soon as possible.

>
> (b) Buried in checkForInterrupts was code to thump TheTimerSemaphore when
> needed and to watch for millisecond clock rollover.
>
> But now it seems the platform implementer is now require to build a pthread
> based routine to mange the next wakeup time, and or do something clever
> elsewhere to
> watch when the millisecond clock rolls over and/or exceeds the next
> expected wakeup time.
>
> However that implies each implementation will have the same behaviour,
> where as before it was magically handled by the VM and not requiring lots of
> independently written operating system timer magic.
>
>
> As of now I don't have a workable vm since I need to implement this timer
> thread logic and one hopes that will solve the where we lockup the carbon
> event handler as something fails
> to process the draw window request at window activation properly.
>

Consider, that my implementation was based on observations of what
windows VM doing. And its already used a multimedia timer, which was
set to trigger checking for interrupts each 1 msec (or at minimum time
periods which OS can provide, but not less than 1mses). And these
routines created own timer thread, hidden from the eyes of developer.
So, what i did, i just replaced this thread by own implementation.
Also, because of using multimedia timer, i seen an ovelap: an
optimistic ( interruptCheckCounter handling) was doing the same as
multimedia timer does, which, IMHO, can't be considered as a good
algorithm.

After refactoring, i got a code which can handle timer with better
accuracy comparing to old VM. (Read a topic some time back about
Delays accuracy).
In fact, i was surprised, when seen, that my implementation provides
more accurate timers, i expected it to be worser :)

>
> --
> ===========================================================================
> John M. McIntosh <[hidden email]>
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ===========================================================================
>
>

--
Best regards,
Igor Stasenko AKA sig.

johnmci

Re: [squeak-dev] hydra vm update

> Consider, that my implementation was based on observations of what
> windows VM doing. And its already used a multimedia timer, which was
> set to trigger checking for interrupts each 1 msec (or at minimum time
> periods which OS can provide, but not less than 1mses). And these
> routines created own timer thread, hidden from the eyes of developer.
> So, what i did, i just replaced this thread by own implementation.
> Also, because of using multimedia timer, i seen an ovelap: an
> optimistic ( interruptCheckCounter handling) was doing the same as
> multimedia timer does, which, IMHO, can't be considered as a good
> algorithm.
>
> After refactoring, i got a code which can handle timer with better
> accuracy comparing to old VM. (Read a topic some time back about
> Delays accuracy).
> In fact, i was surprised, when seen, that my implementation provides
> more accurate timers, i expected it to be worser :)

Ok, I'll have to run some benchmarks, when I changed the macintosh VM
to pound the interrupt delay
logic 1000 a second back in the era of 500 Mhz machines the impact on
performance was noticeable.
Maybe today no one cares, maybe the folks chasing the why does opening
windows take 2x as long
can fix that, then mmm we'll consume part of the gain back in overhead
to improve Delay accuracy.

Maybe I can clock watch before handleEvents() to avoid the overhead of
a timer routine.

I note we use to have clock watching on each primitive call in ages
past, but removed it since we
found that under certain conditions one could spend % of time just
getting the clock, some vestiges
of that lurk via the non-existant lowres millisecond clock function.
Maybe it's not noticeable now.

Ya, benchmarking first...

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
=
=
=
========================================================================

Igor Stasenko

Re: [squeak-dev] hydra vm update

Oops, didn't noticed that message sent only to John.

---------- Forwarded message ----------
From: Igor Stasenko <[hidden email]>
Date: 2008/5/3
Subject: Re: [squeak-dev] hydra vm update
To: [hidden email]

2008/5/3 John M McIntosh <[hidden email]>:

>
> > Consider, that my implementation was based on observations of what
> > windows VM doing. And its already used a multimedia timer, which was
> > set to trigger checking for interrupts each 1 msec (or at minimum time
> > periods which OS can provide, but not less than 1mses). And these
> > routines created own timer thread, hidden from the eyes of developer.
> > So, what i did, i just replaced this thread by own implementation.
> > Also, because of using multimedia timer, i seen an ovelap: an
> > optimistic ( interruptCheckCounter handling) was doing the same as
> > multimedia timer does, which, IMHO, can't be considered as a good
> > algorithm.
> >
> > After refactoring, i got a code which can handle timer with better
> > accuracy comparing to old VM. (Read a topic some time back about
> > Delays accuracy).
> > In fact, i was surprised, when seen, that my implementation provides
> > more accurate timers, i expected it to be worser :)
> >
>
> Ok, I'll have to run some benchmarks, when I changed the macintosh VM to
> pound the interrupt delay
> logic 1000 a second back in the era of 500 Mhz machines the impact on
> performance was noticeable.
> Maybe today no one cares, maybe the folks chasing the why does opening
> windows take 2x as long
> can fix that, then mmm we'll consume part of the gain back in overhead to
> improve Delay accuracy.
>

My benchmarks showing opposite: the tinyBenchmarks runs faster with
new model than with old checkForInterrupts.
And Delay accuracy is improved. Maybe this can be not true for other
platforms, but for Windows, i got gain in both areas, at a cost of
implementing own timer thread routine. :)

> Maybe I can clock watch before handleEvents() to avoid the overhead of a
> timer routine.
>
> I note we use to have clock watching on each primitive call in ages past,
> but removed it since we
> found that under certain conditions one could spend % of time just getting
> the clock, some vestiges
> of that lurk via the non-existant lowres millisecond clock function. Maybe
> it's not noticeable now.
>
> Ya, benchmarking first...
>

Also note, that HydraVM is targeted for multicore CPUs. In case if
timer thread resides on different core than interpreter thread, there
will be no scheduler overhead to switch active thread, and this makes
delays handling even more accurate.

Just run test again on quad-core box with same image:

|delay bag| delay := Delay forMilliseconds: 1.
bag := Bag new.
1000 timesRepeat:[bag add: [delay wait] timeToRun].
bag sortedCounts

HydraVM:

a SortedCollection(932->2 67->1 1->3)
a SortedCollection(932->2 68->1)
a SortedCollection(932->2 68->1)

Croquet VM:

a SortedCollection(952->2 48->1)
a SortedCollection(951->2 48->1 1->4)
a SortedCollection(951->2 46->1 3->3)

This can be interpreted to one of following:
- Hydra delays is more accurate
- maybe its having similar accuracy (or even lower), but spends less
time to signal semaphore, so we got noticeable increase in numbers
with 1 msec results.

No wonder, benchmarks on vanilla VM are faster :

Croquet VM

1 tinyBenchmarks
'485768500 bytecodes/sec; 14611978 sends/sec'
'486229819 bytecodes/sec; 14520007 sends/sec'
'485768500 bytecodes/sec; 14554360 sends/sec'

[ 1 tinyBenchmarks ] timeToRun
5337
5336
5402

HydraVM:

1 tinyBenchmarks
'454706927 bytecodes/sec; 13731345 sends/sec'
'455516014 bytecodes/sec; 13363453 sends/sec'
'456327985 bytecodes/sec; 13210400 sends/sec'
'453900709 bytecodes/sec; 13373136 sends/sec'

[ 1 tinyBenchmarks ] timeToRun
5692
5770
5796

Not sure, if using timeToRun is fair here, because it using timeToRun
in code to determine one of the parameters.

This actually shows an overhead of introducing interpreter as argument
to each function.

However, it would be interesting to add as option to build VM using
thread-local storage to minimize impact of introducing multiple
interpreter instances. As a bonus we'll have full compatibility with
old primitives, because we don't need to pass interpreter instance as
argument.
But it was a design choice, and we decided to pass interpreter as
extra argument instead of using thread-local storage.

--
Best regards,
Igor Stasenko AKA sig.