Problematic code in Delay>checkDelayedTasks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hi All,

When trying to solve a problem that might have something to do with Delays not firing correctly, I found some suspect code in Delay>checkDelayedTasks.  The code is seldom if ever executed, so I doubt it has anything to do with my problem but someone should take a look at it anyway.

Delay>checkDelayedTasks is run when the OS fires the timer interrupt.  This is normally every 100 milliseconds.  Delay>checkDelayedTasks uses the Time>millisecondClockValue that is the number of milliseconds since the OS was booted (at least for windows).  This wraps every 49+ days.  There is code in  Delay>checkDelayedTasks to test if the Time>millisecondClockValue has gotten smaller.  This shouldn't happen other than when it wraps around to zero (0) every 49+ days.  The comments say that the Time>millisecondClockValue could get smaller when daylight saving time ends but my test show that this isn't the case.  But since the  Time>millisecondClockValue is outside the control of the Smalltalk image code of its VM, it should be tested for getting smaller other than wrapping around.  This leads me to the suspect code.

There is a test to see if Time>millisecondClockValue has gotten smaller by MillisecondClockValueEpsilon which is set to 300000 milliseconds or 5 minutes.  If this is the case, the current time is assumed to be the PreviousClockValue and the code continues on as usual.  The collection of delays are run to see if any have expired.  None would have as they would have expired the last time through the code.  The problem is, if the Time>millisecondClockValue jumped back about 5 minutes and then runs forward from there, all delays will take 5 minutes longer to fire.

The Time>millisecondClockValue will tick its way back to the saved PreviousClockValue but no delay will fire until the Time>millisecondClockValue gets past PreviousClockValue, which could be as long as MillisecondClockValueEpsilon or 5 minutes.

The rollover code looks good for the rollover case:

rolloverTime := PreviousClockValue + InterruptPeriod.

"Adjust the resumption time for every process that
was suspended before the rollover. If the resumption time
was after the previous clock value but before the projected
rollover time then adjust the resumption time to be exactly
the projected rollover time (which is zero after the rollover)."
self delayedTasks do: [:task | | t |
(t := task resumptionTime) >= PreviousClockValue
ifTrue: [task resumptionTime: ((t - rolloverTime) max: 0)]]]].

But I would remove the MillisecondClockValueEpsilon test and adjust the delays like so:

timeAdjustment := PreviousClockValue - currentTime.
self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].

Note that I don't worry about the redumptionTime being beyond the PreviousClockValue, they all should be and it won't hurt if they aren't.  Also,  I don't worry about the redumptionTime going negative, the delay will still fire.

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/sQWX27sdlOsJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
On Thursday, August 2, 2012 5:22:00 PM UTC-4, Louis LaBrunda wrote:

Note that I don't worry about the redumptionTime being beyond the PreviousClockValue, they all should be and it won't hurt if they aren't.  Also,  I don't worry about the redumptionTime going negative, the delay will still fire.

If I remember correctly, it was actually an incident I was involved in  that resulted in this code being added.  Our application, running on an AIX box, was actually being affected by time apparently moving backward.  Delays were being expired prematurely, impacting our production line.  In actual fact, the computer's clock was running fast, and kept being reset by a Kerberos monitor that was watching a cluster of computers.

Which is not to say that there are no problems with the implementation, but there are reasons for it being the way it is.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/7W82FaP792gJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hi Tom,

Your AIX experience is interesting, my experience is only with Windows.  On Windows, Time>millisecondClockValue does not change when the time-of-day-clock changes.  Thinking about it I can see that both adjusting Time>millisecondClockValue and not adjusting it when the time-of-day-clock changes have merits.  Not adjusting Time>millisecondClockValue makes using it easier, adjusting it gives a more accurate indication of how long the OS has been running.  But since I don't see much value in knowing how long the OS has been running, I would rather have it un-adjusted and alway growing, at least until it wraps.

I still think my suggestion for a code change is good.  I don't think it will cause delays to end early or late.  However, if the Time>millisecondClockValue jumps ahead (because the system clock was running slow and was reset), I don't see (at least not yet) how to detect it and then delays will end early.  Maybe, the code should check for Time>millisecondClockValue changing by much more than InterruptPeriod in any direction and then deal with that as best it can.

Lou

On Friday, August 3, 2012 7:39:13 AM UTC-4, Thomas Koschate wrote:
On Thursday, August 2, 2012 5:22:00 PM UTC-4, Louis LaBrunda wrote:

Note that I don't worry about the redumptionTime being beyond the PreviousClockValue, they all should be and it won't hurt if they aren't.  Also,  I don't worry about the redumptionTime going negative, the delay will still fire.

If I remember correctly, it was actually an incident I was involved in  that resulted in this code being added.  Our application, running on an AIX box, was actually being affected by time apparently moving backward.  Delays were being expired prematurely, impacting our production line.  In actual fact, the computer's clock was running fast, and kept being reset by a Kerberos monitor that was watching a cluster of computers.

Which is not to say that there are no problems with the implementation, but there are reasons for it being the way it is.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/A7WXaNFpJ3AJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
On Friday, August 3, 2012 10:38:36 AM UTC-4, Louis LaBrunda wrote:
 
Your AIX experience is interesting, my experience is only with Windows.

It was new for us, too.  We had just converted over to some new hardware, and before we found out about the time being reset from an external source, the only obvious explanation was that the hardware had become so fast that it was exceeding the speed of light.  :{)  Just to make it interesting, we were still running VA 4.5 at the time, although the rest of the world had moved on to 6.0 by then.

I'll try to find my old notes on the problem this weekend to see if they shed any light on the thinking process that led to the current solution.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/J0fS_8rcXTIJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hi Tom and Everyone,

I have been thinking about this over the weekend when I had nothing better to do like trying to fall asleep.  Here's what I have concluded.  For the purpose of brevity, I will refer to Time>millisecondClockValue as the clock and to Delay>checkDelayedTask as the code or this code.

Upon entering Delay>checkDelayedTask the clock may have moved forward or backward.  I think there are two conditions where the clock may have moved backward and three where it may have moved forward.  Backward could be caused by the clock rolling over or by the clock being reset back in time a bit (I don't see this on Windows but I take Toms word for it happening on AIX and I guess maybe UNIX or Linux).

Moving forward can be the clock ticking at a normal pace, a faster (or I guess slower) pace and jumping forward due to a clock reset.  Faster or slower running clocks are I think undetectable by the code, so we must ignore them.  Jumping forward, I think can be handled in a similar manner to a backward jump.

As Tom remembers things, the clock was running fast and a Kerberos monitor was resetting the clock backwards from time to time.  The current code was written to take this into account.  No offense intended to whom ever wrote the code but the current code doesn't solve the problem.  The clock running fast was the cause of the Delays being expired prematurely (not the backward jumps) and the current code doesn't test for this (which it can't) or even test for jumping ahead.  The current code has added a new problem by ignoring small jumps backward (up to 5 minutes), which then causes delays to expire late by the amount of time of the up to 5 minute backward jump.

I suggest we test for both backward and forward jumps in time.  Backward jumps are easy, if currentTime is smaller than PreviousClockValue, the clock has jumped backward or rolled over.  Forward jumps are a little harder, if currentTime is greater than PreviousClockValue by very much more than InterruptPeriod, we can consider the clock having been reset and jumped forward.  Entering the code can of course take longer than InterruptPeriod but shouldn't take a lot longer, say 500 milliseconds.

This can be tested by checking to see if currentTime is between PreviousClockValue and PreviousClockValue + 500.

((currentTime between: PreviousClockValue and: PreviousClockValue + 500) ifFalse: [
   timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.
   self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
].
 
I think the time adjustment code works for both forward and backward jumps as it is positive for backward jumps (very large for rollovers) thus making  resumptionTime move forward and negative for forward jumps thus pushing resumptionTime off for a while.  Note that InterruptPeriod is added because it is likely at least InterruptPeriod has passed weather the clock has jumped forward or backward.

Does anyone know if the OS keeps multiple timer interrupts (I think they may) when asked.  If so, and we add another before the previous is fired, Delay>checkDelayedTask can be called before InterruptPeriod is up but I don't think we care too much.

Lou


On Friday, August 3, 2012 11:24:58 AM UTC-4, Thomas Koschate wrote:
On Friday, August 3, 2012 10:38:36 AM UTC-4, Louis LaBrunda wrote:
 
Your AIX experience is interesting, my experience is only with Windows.

It was new for us, too.  We had just converted over to some new hardware, and before we found out about the time being reset from an external source, the only obvious explanation was that the hardware had become so fast that it was exceeding the speed of light.  :{)  Just to make it interesting, we were still running VA 4.5 at the time, although the rest of the world had moved on to 6.0 by then.

I'll try to find my old notes on the problem this weekend to see if they shed any light on the thinking process that led to the current solution.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/ZLWnqZaMqr8J.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
I've really got to get a VA environment set up at home again!  I did check my old notes, and discovered that we wrestled with this problem in mid-November of 2001.  The IBM team working on it found that it also affected Java applications, which wasn't a major issue for us at the time.  Not a lot of other details...

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/C7Lt4V9UDwMJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hey Tom,

On Monday, August 6, 2012 12:53:53 PM UTC-4, Thomas Koschate wrote:
I've really got to get a VA environment set up at home again!  I did check my old notes, and discovered that we wrestled with this problem in mid-November of 2001.  The IBM team working on it found that it also affected Java applications, which wasn't a major issue for us at the time.  Not a lot of other details...

Tom

I am impressed that you remember much of anything about this from 2001.

As for affecting Java apps it makes sense that Java could have the same kind of code.  Asking the OS to trigger an event periodically makes sense.  Using the millisecond clock to keep track of delays or when to issue callbacks looks like the only game in town.  And if it didn't change with time-of-day time changes (as it seems to be on Windows) it would be pretty sound.  I wonder how it is/was on OS2?

It would be very good if others could follow my logic and see if I am correct or off in the weeds.

It would also be interesting if others on AIX, UNIX and Linux systems could see if the millisecond clock changes with time-of-day changes.  This is fairly easy from the dev env, by displaying Time>millisecondClockValue, change the time from the OS and displaying Time>millisecondClockValue again.  I am surprised that it does change, as I can see no good reason for it to change and can think a good argument for it not to.

Lou 

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/4FpTDWX0YkkJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
On Monday, August 6, 2012 2:08:35 PM UTC-4, Louis LaBrunda wrote:

I am impressed that you remember much of anything about this from 2001.

With my notebook to job my memory, I'm good.  Just don't ask me about the 80's.  :{)
 
As for affecting Java apps it makes sense that Java could have the same kind of code.  Asking the OS to trigger an event periodically makes sense.  Using the millisecond clock to keep track of delays or when to issue callbacks looks like the only game in town.  And if it didn't change with time-of-day time changes (as it seems to be on Windows) it would be pretty sound.  I wonder how it is/was on OS2?

It would be very good if others could follow my logic and see if I am correct or off in the weeds.

I'll dig into things once I'm back at the office tomorrow, assuming that the wheels haven't fallen off something over the weekend.
 
This is fairly easy from the dev env, by displaying Time>millisecondClockValue, change the time from the OS and displaying Time>millisecondClockValue again.  

My notes mention a C program I wrote at the time to illustrate the behavior on different hardware, but, sadly, the code is lost in the mists of time.  My Unix account was cleaned out between my sentences there.
 
I am surprised that it does change, as I can see no good reason for it to change and can think a good argument for it not to.

As I mentioned originally, the time changes were a Kerberos artifact.  When using Kerberos tickets to provide authentication between machines in a cluster, it's vital that time be kept consistent between machines so that tickets don't expire unexpectedly.   Stand-alone Windows and Unix boxes don't have such a stringent requirement on their timekeeping, so they can get by with hitting an NTP server on a regular basis.  

That being said, it's still possible for their clocks to get out of synch with the NTP server, requiring a reset, but it's less frequent, and clocks on modern hardware may be better than they were on our Copper Node PowerPC AIX box at the time.  

Had that machine had a reasonably accurate clock, we may never have seen this problem, or it might have been a fairly infrequent issue that would have really had us scratching our heads.  As it was, we were watching the system quite closely - it was a hardware upgrade that we'd prepared for by freezing code and running it on the old system for a month beforehand, so we were fairly confident of software solidity.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/psGNyehMc1EJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
In reply to this post by Louis LaBrunda
On Monday, August 6, 2012 2:08:35 PM UTC-4, Louis LaBrunda wrote:

I am impressed that you remember much of anything about this from 2001.

With my notebook to jog my memory, I'm good.  Just don't ask me about the 80's.  :{)
 
As for affecting Java apps it makes sense that Java could have the same kind of code.  Asking the OS to trigger an event periodically makes sense.  Using the millisecond clock to keep track of delays or when to issue callbacks looks like the only game in town.  And if it didn't change with time-of-day time changes (as it seems to be on Windows) it would be pretty sound.  I wonder how it is/was on OS2?

It would be very good if others could follow my logic and see if I am correct or off in the weeds.

I'll dig into things once I'm back at the office tomorrow, assuming that the wheels haven't fallen off something over the weekend.
 
This is fairly easy from the dev env, by displaying Time>millisecondClockValue, change the time from the OS and displaying Time>millisecondClockValue again.  

My notes mention a C program I wrote at the time to illustrate the behavior on different hardware, but, sadly, the code is lost in the mists of time.  My Unix account was cleaned out between my sentences there.
 
I am surprised that it does change, as I can see no good reason for it to change and can think a good argument for it not to.

As I mentioned originally, the time changes were a Kerberos artifact.  When using Kerberos tickets to provide authentication between machines in a cluster, it's vital that time be kept consistent between machines so that tickets don't expire unexpectedly.   Stand-alone Windows and Unix boxes don't have such a stringent requirement on their timekeeping, so they can get by with hitting an NTP server on a regular basis.  

That being said, it's still possible for their clocks to get out of synch with the NTP server, requiring a reset, but it's less frequent, and clocks on modern hardware may be better than they were on our Copper Node PowerPC AIX box at the time.  

Had that machine had a reasonably accurate clock, we may never have seen this problem, or it might have been a fairly infrequent issue that would have really had us scratching our heads.  As it was, we were watching the system quite closely - it was a hardware upgrade that we'd prepared for by freezing code and running it on the old system for a month beforehand, so we were fairly confident of software solidity.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/f7Qb1po58iYJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hi Tom,

On Monday, August 6, 2012 2:30:09 PM UTC-4, Thomas Koschate wrote:
As I mentioned originally, the time changes were a Kerberos artifact.  When using Kerberos tickets to provide authentication between machines in a cluster, it's vital that time be kept consistent between machines so that tickets don't expire unexpectedly.   Stand-alone Windows and Unix boxes don't have such a stringent requirement on their timekeeping, so they can get by with hitting an NTP server on a regular basis.

 I understand the use and need of Kerberos or other tools to keep the time-of-day clocks in computers in sync and as accurate as possible.  But the millisecond clock is a different animal or clock.  It is barely a clock in the common sense and not one in the sense that it keeps the time-of-day.  It is (I think) just a place in memory where something like 10,15, 16 or 17 (it is kept in milliseconds but is not accurate to the millisecond) is added every time the OS's time interrupt fires.  So, I can see why the time of day clock can and should be reset when needed but I don't see why the milliseconds clock should get adjusted.

If the millisecond clock is defined as the number of milliseconds since the OS was booted, then I guess it should get adjusted but I can think of better ways to know how long the OS has been running like keeping a time stamp of when it was booted.  This wouldn't need adjustment and a little math gives the duration.

If the definition is just an ever increasing (until it rolls over) number of milliseconds, it shouldn't get adjusted and makes its use for managing delays and callback much easier.

That said, I hope my suggested code deals with the adjustment of the millisecond clock with very little extra code than is needed for rollover.

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/6UnBaOyiD8YJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
In reply to this post by Louis LaBrunda
On Monday, August 6, 2012 12:29:50 PM UTC-4, Louis LaBrunda wrote:

The clock running fast was the cause of the Delays being expired prematurely (not the backward jumps) and the current code doesn't test for this (which it can't) or even test for jumping ahead.

Actually, Delays were being expired prematurely because of the backward jumps.  As originally implemented, the delay timer was seeing that the new time was less than the old time, and therefore assuming clock rollover.  Based on that assumption, it was decided that there was a major shift in time forward, and any Delays should be expired.
 
 The current code has added a new problem by ignoring small jumps backward (up to 5 minutes), which then causes delays to expire late by the amount of time of the up to 5 minute backward jump.

In our case, at least, a timer expiring a little later is less problematic than one expiring too soon.  We use delays to control timeouts on things such as torque gun controls (we build cars here), and if the delay expires too soon, the assembly line worker does not have enough time to complete his or her task on the vehicle.

This can be tested by checking to see if currentTime is between PreviousClockValue and PreviousClockValue + 500.

((currentTime between: PreviousClockValue and: PreviousClockValue + 500) ifFalse: [
   timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.
   self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
 
I don't actually see how your code is materially different from what is already there, except for tighter values on the test range.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/nvV8ncs4kN4J.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hey Tom,

On Tuesday, August 7, 2012 8:41:57 AM UTC-4, Thomas Koschate wrote:
On Monday, August 6, 2012 12:29:50 PM UTC-4, Louis LaBrunda wrote:

The clock running fast was the cause of the Delays being expired prematurely (not the backward jumps) and the current code doesn't test for this (which it can't) or even test for jumping ahead.

Actually, Delays were being expired prematurely because of the backward jumps.  As originally implemented, the delay timer was seeing that the new time was less than the old time, and therefore assuming clock rollover.  Based on that assumption, it was decided that there was a major shift in time forward, and any Delays should be expired.

Ah I see it now, the rollover code assumed a rollover:))  The problem is the math only took care of a rollover.  My code, for the time adjustment works (I think) for both rollovers and small jumps forward and backward.  It looks too simple to cover all those conditions, so others should please think it through and tell me if I'm dreaming.

timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.

The current code has added a new problem by ignoring small jumps backward (up to 5 minutes), which then causes delays to expire late by the amount of time of the up to 5 minute backward jump.

In our case, at least, a timer expiring a little later is less problematic than one expiring too soon.  We use delays to control timeouts on things such as torque gun controls (we build cars here), and if the delay expires too soon, the assembly line worker does not have enough time to complete his or her task on the vehicle. 

Got it.  I must have had my time adjustment code in my head and not the old code that really only did rollover correctly and didn't realize how the rollover ended your delays.

This can be tested by checking to see if currentTime is between PreviousClockValue and PreviousClockValue + 500.

((currentTime between: PreviousClockValue and: PreviousClockValue + 500) ifFalse: [
   timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.
   self delayedTasks do: [:task | task resumptionTime: (task resumptionTime - timeAdjustment)].
 
I don't actually see how your code is materially different from what is already there, except for tighter values on the test range.

Tom

The test range covers both forward and backward jumps, the old code only did rollovers and ignored small backward jumps.  The time adjustment line covers the size of the forward or backward jump.  Assuming my math does what I think it does.

Lou 

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/qK817HY6TqYJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
Lou, I'm still not seeing a significant difference between what you've done and what's already implemented except for using 500 ms instead of 30000.

"currentTime between: PreviousClockValue and: PreviousClockValue + 500" is equivalent to "PreviousClockValue - currentTime < 500", so the same delays would be affected.  The only other change is that you're touching all delays rather than just those that should have expired since the last check.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/PpzX6v-Gy_IJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Tom,

On Tuesday, August 7, 2012 2:00:15 PM UTC-4, Thomas Koschate wrote:
Lou, I'm still not seeing a significant difference between what you've done and what's already implemented except for using 500 ms instead of 30000.

"currentTime between: PreviousClockValue and: PreviousClockValue + 500" is equivalent to "PreviousClockValue - currentTime < 500", so the same delays would be affected.  The only other change is that you're touching all delays rather than just those that should have expired since the last check.

Tom

The range test is important because it tests for rollover and any jump backward and a significant jump forward in the millisecond clock.  The real difference is in the time adjustment:

timeAdjustment := PreviousClockValue - currentTime + InterruptPeriod.

compared to the rollover based adjustment:

rolloverTime := PreviousClockValue + InterruptPeriod.

Which assumes that currentTime (the millisecond clock) is basically zero (0).  My code, by subtracting the currentTime, (which is not small for small jumps backward) handles small jumps backward.  For jumps forward, currentTime is larger than PreviousClockValue, this results in negative timeAdjustment (not worrying about InterruptPeriod for a second).  Subtracting the negative timeAdjustment, adds time to the delay (for forward jumps).  Pushing them into the future, which makes sense if the clock jumped ahead.

If you have the time (puns are flying all over this post) try some numbers.  I will try to post some numbers to show the various possibilities.

Lou
 

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/QKxnlzJv0BEJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
On Tuesday, August 7, 2012 2:20:11 PM UTC-4, Louis LaBrunda wrote:

If you have the time (puns are flying all over this post) try some numbers.  I will try to post some numbers to show the various possibilities.

By all means, bring on some numbers.   However, I'm worried that we may have headed off down some rabbit hole that may or may not have anything to do with the problem you're really having.  Could we get some more details on that without giving away any trade secrets?

Here's how I interpret the code as it currently exists:

If the current timestamp is greater than or equal to the previous timestamp, then time is progressing more or less as we expect.
If it isn't, let's see if the time has moved back more than our tolerated amount (MillisecondClockValueEpsilon).
If it hasn't, we'll just pretend that the current timetamp is the same as the previous timestamp.
Otherwise, we'll reset the resumption times of all the delays that should have resumption timestamps after the previous timestamp to be some time in the period after the previous timestamp plus the timer resolution (100ms).

Once we've done all that, let's see if anybody's resumption time is less than the current time, and resume them.

The only way I could see you having a problem with this is if the MillisecondClockValueEpsilon is larger than you can tolerate.

However, bring on some numbers, and let's see.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/Wuo2hSyElEMJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
Hey Tom,

Run this code in a workspace, play with the numbers if you like.

tests
add: #('Rollover' 4294967200 10 4294967400);
add: #('Jump Back' 4294967290 4294967200 4294967395);
add: #('Jump Forward' 4294966100 42949676000 4294966205).

Transcript cr.
tests do: [:t | | pcv ct adj delayTo |
pcv := t second.
ct := t third.
adj := pcv - ct + 100.
delayTo := t fourth.
Transcript show: t first; show: ' - PreviousClockValue>'; show: pcv printString;
show: ', CurrentTime>'; show: ct printString; show: ', Adjustment>'; show: adj printString;
show: ', delayToWas>'; show: delayTo printString; show: ', delayToIs>'; show: (delayTo - adj) printString; cr.
].

You should see in the transcript:

Rollover - PreviousClockValue>4294967200, CurrentTime>10, Adjustment>4294967290, delayToWas>4294967400, delayToIs>110
Jump Back - PreviousClockValue>4294967290, CurrentTime>4294967200, Adjustment>190, delayToWas>4294967395, delayToIs>4294967205
Jump Forward - PreviousClockValue>4294966100, CurrentTime>42949676000, Adjustment>-38654709800, delayToWas>4294966205, delayToIs>42949676005

On Tuesday, August 7, 2012 2:44:35 PM UTC-4, Thomas Koschate wrote:
On Tuesday, August 7, 2012 2:20:11 PM UTC-4, Louis LaBrunda wrote:

If you have the time (puns are flying all over this post) try some numbers.  I will try to post some numbers to show the various possibilities.

By all means, bring on some numbers.   However, I'm worried that we may have headed off down some rabbit hole that may or may not have anything to do with the problem you're really having.  Could we get some more details on that without giving away any trade secrets?

Here's how I interpret the code as it currently exists:

If the current timestamp is greater than or equal to the previous timestamp, then time is progressing more or less as we expect.
If it isn't, let's see if the time has moved back more than our tolerated amount (MillisecondClockValueEpsilon).
If it hasn't, we'll just pretend that the current timetamp is the same as the previous timestamp.
Otherwise, we'll reset the resumption times of all the delays that should have resumption timestamps after the previous timestamp to be some time in the period after the previous timestamp plus the timer resolution (100ms).

Once we've done all that, let's see if anybody's resumption time is less than the current time, and resume them.

The only way I could see you having a problem with this is if the MillisecondClockValueEpsilon is larger than you can tolerate.

However, bring on some numbers, and let's see.

Tom

 I am having two odd problems that ?may? have something to do with delays (I will explain more below) and you are correct that I don't think they have anything to do with the delay code that I think has the problems I have been explaining.  To recap those problems:

1) Short jumps backward (within MillisecondClockValueEpsilon) will extend delays that can cause connections to sockets to timeout and/or work that may come or go through those connections to be for up to MillisecondClockValueEpsilon.
2) Short jumps backward (longer then MillisecondClockValueEpsilon) are treated like a rollover and would have your original problem.
3) Jumps forward of any size are not handled.

I think my code fixes all of the above without the need for MillisecondClockValueEpsilon.

Now to my two odd problems.  First is one I'm not too concerned with.  When the time-of-day clock is set back an hour when summer time ends, some (but may be not all) of my programs running as Windows NT services (there may be 5 to 15 running on a box at once) will go into 100% CPU usage.  The programs seem to keep doing useful work but all the idle time gets used up in some loop.  I can't fine any loops that would cause this so, it seems as if at least some delays in loops are ending too soon.

The second problem is more of a concern.  The same programs run 24/7.  They have a few forks or processes running.  All the processes loop on a boolean that is true until the service is told to stop at which time the boolean is set to false.  All these loops end with a delay of various durations.  For no reason that I can discern, on of the programs will have one or more of its processes stop running.  By stop running I mean the process is still ready but it doesn't seem to get dispatched.  They are of a high priority and should easily be ready to run but don't seem to.

I know this because one of the processes writes a line to the TTY log every two minutes.  When this process seems to stop these entries stop.  This happened to one of the programs a week or so ago.  The main work process (at a lower priority than the process writing to the log) of the program kept running.  After a while, I think close to an hour, the problem went away.  The problem seldom get better and it takes a restart of the program to fix it but last week one got better.  When the program was stopped, information about all the processes was dumped to another log.  The TTY logging process showed it was ready to run and at the correct priority.

My latest theory (I'm trying to think of how to test it) is that a delay ends and its process is set to ready but NOT added back to its priority queue.  In the case of the program where things went back to normal, maybe the process got put back into the queue.  The code at the end of Delay>checkDelayedTasks that sets processes ready and puts them back in their queue is not wrapped in a critical block and interrupts are not disabled, so maybe with some bad timing, the process may not get put back in its priority queue.

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/atPltzlHsukJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Thomas Koschate-2
On Tuesday, August 7, 2012 4:14:26 PM UTC-4, Louis LaBrunda wrote:

Run this code in a workspace, play with the numbers if you like.

 I'm guessing these numbers are actual observations you've gathered from your testing.  I assume that you had some printStrings embedded in the #checkDelayedTasks method in order to gather them.

Based on that, I'm not sure I seen the problem. 

My latest theory (I'm trying to think of how to test it) is that a delay ends and its process is set to ready but NOT added back to its priority queue.  In the case of the program where things went back to normal, maybe the process got put back into the queue.  The code at the end of Delay>checkDelayedTasks that sets processes ready and puts them back in their queue is not wrapped in a critical block and interrupts are not disabled, so maybe with some bad timing, the process may not get put back in its priority queue.

While I'm loathe to suggest a major architectural change, have you looked at SstCron and SstCronEntry as a way to manage your pool of tasks?  Your problems may not go away, but a larger chunk of the issue is now squarely back in Instantiations code, and you're in a much better position to toss the problem to them.

Tom

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/MYD2BnF-YKMJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda


On Wednesday, August 8, 2012 9:13:58 AM UTC-4, Thomas Koschate wrote:
On Tuesday, August 7, 2012 4:14:26 PM UTC-4, Louis LaBrunda wrote:

Run this code in a workspace, play with the numbers if you like.

 I'm guessing these numbers are actual observations you've gathered from your testing.  I assume that you had some printStrings embedded in the #checkDelayedTasks method in order to gather them.

No, I made the numbers up.  They are not meant to show the problem but show how my code would adjust the delays.  The first show the millisecond clock when it is close to a rollover as it rolls over.  My code adjusts the delay to a small number close to the zero rollover point.  The second shows a small jump backward, it could be anywhere in the millisecond range of 0 to 4294967295.  My code adjusts the delay to a number a little lower than the previous clock value, ready to move forward with the new clock value.  The third example shows a jump forward (again could be anywhere in the normal range), the original code misses this possibility.  My code adjusts the delay to a reasonable amount in later than the previous clock.
 
Based on that, I'm not sure I seen the problem. 

My latest theory (I'm trying to think of how to test it) is that a delay ends and its process is set to ready but NOT added back to its priority queue.  In the case of the program where things went back to normal, maybe the process got put back into the queue.  The code at the end of Delay>checkDelayedTasks that sets processes ready and puts them back in their queue is not wrapped in a critical block and interrupts are not disabled, so maybe with some bad timing, the process may not get put back in its priority queue.

While I'm loathe to suggest a major architectural change, have you looked at SstCron and SstCronEntry as a way to manage your pool of tasks?  Your problems may not go away, but a larger chunk of the issue is now squarely back in Instantiations code, and you're in a much better position to toss the problem to them.

Tom

I will try to take a look at SstCron and SstCronEntry.  But I think the Delay code is just wrong and should be fixed.  Once I am completely sure of this I will formally report it to Instantiations.  As for my problems, I'm not sure what the cause is.  But my use of delay and anything to do with time is very simple. Basically, the loops do whatever work they can do and delay just to let other processes and the OS run for a while.

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/L0QqjAZuSaUJ.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.
Reply | Threaded
Open this post in threaded view
|

Re: Problematic code in Delay>checkDelayedTasks

Louis LaBrunda
In reply to this post by Thomas Koschate-2
Tom,

Just a heads up about the current code.  If the computer involved (or any other) were to start making up for running the clock too fast (just kidding) and start running slow, the time adjustment system would cause jumps of the millisecond clock into the future.  With the current code, delays would again end prematurely and you would be back to your original problem.

Lou

--
You received this message because you are subscribed to the Google Groups "VA Smalltalk" group.
To view this discussion on the web visit https://groups.google.com/d/msg/va-smalltalk/-/XKCE9NisnC8J.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at http://groups.google.com/group/va-smalltalk?hl=en.