Hello all,
I suspect Dr. Heisenberg is having a very good laugh at my expense :) I have a state machine of sorts that is, well, out of sorts. To get a handle on the problem, I have added some logging, and the fun part is that to make the logs intelligible, I had to add some critical sections. It seems almost guaranteed that the logging will affect the behavior, if only a small amount. The logging is optional, and I can direct it to various streams, including Trace. On the enhancement front, I found it useful to use Trace's mutex, and added a method to expose it. Should that be included in the base system? The state machine controls serial communications (two way) with an external device. It has send and receive semaphores, and they get signalled under all kinds of crazy conditions - too many, it turns out. The synchronized log entries for the state machine running against a disconnected port show that it is piling up excess signals on the read semaphore. While I might be able to get away with it, it is not intended, and really should be fixed. Any thoughts on using #set vs. #signal? I suspect I would be better off to figure out exactly when I need to signal the semphore and "get it right". Do you agree? Or is this a situation that screams for #set? Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill Schwab wrote:
> Hello all, > > I suspect Dr. Heisenberg is having a very good laugh at my expense :) I > have a state machine of sorts that is, well, out of sorts. To get a > handle on the problem, I have added some logging, and the fun part is > that to make the logs intelligible, I had to add some critical sections. > It seems almost guaranteed that the logging will affect the behavior, > if only a small amount. As in - the window changes from slightly open to slightly more than slightly open, and back again. Fun, isn't it? Seems every time I go that route, I end up solving the problem without much help from the (increasingly complex) logging. The logging is optional, and I can direct it to > various streams, including Trace. > > On the enhancement front, I found it useful to use Trace's mutex, and > added a method to expose it. Should that be included in the base system? > > The state machine controls serial communications (two way) with an > external device. It has send and receive semaphores, and they get > signalled under all kinds of crazy conditions - too many, it turns out. Seems to me there shouldn't be very many places that signalling occurs. I'd guess 1-3 for each direction, tops. I suspect refactoring toward OAOO will help. > The synchronized log entries for the state machine running against a > disconnected port show that it is piling up excess signals on the read > semaphore. While I might be able to get away with it, it is not > intended, and really should be fixed. > > Any thoughts on using #set vs. #signal? I suspect I would be better off > to figure out exactly when I need to signal the semphore and "get it > right". Do you agree? Or is this a situation that screams for #set? Depends on how you're wired, Bill. I'd be inclined to figure out what the state machine is, and either eliminate the extraneous signalling directly, or recognize it as non-determinism, and convert the NDFA to an equivalent DFA. Since #set masks extra signals, I'd stick with #signal, until I *knew* that #set would only be masking epsilon edges. But you could take the other route, and just ensure that it passes every test case your app cares about. If you can increase the level of detail, either here, or in an off-line message, I can take a deeper look. Regards, -cstb |
> Seems every time I go that route, I end up solving
> the problem without much help from the (increasingly > complex) logging. In this case, I suspect it has already alerted me to a very important problem. > Seems to me there shouldn't be very many places > that signalling occurs. I'd guess 1-3 for each > direction, tops. I suspect refactoring toward > OAOO will help. It is helping as I work toward it. Ironically, the concern is partly due to the changes made to date. Basically, I start these machines and walk away. They run for weeks at a time - 49 days and change - until Delay breaks, or until patches force me to reboot. During that time, external devices get unplugged, partially unplugged, powered off and restarted. Regardless of unit tests, I am going to be nervous for a few weeks :( > Depends on how you're wired, Bill. > I'd be inclined to figure out what the state machine is, > and either eliminate the extraneous signalling directly, > or recognize it as non-determinism, and convert the NDFA > to an equivalent DFA. That's an interesting view of it. > Since #set masks extra signals, > I'd stick with #signal, until I *knew* that #set would > only be masking epsilon edges. Fair enough - thanks for confirming my hunch. > But you could take the other route, and just ensure that > it passes every test case your app cares about. Unfortunately, the tests require "random" events in real hardware; it either runs for weeks at a time under battle conditions or it doesn't. I have a growing collection of tests that verify the protocols, but the timing is more difficult to extract into unit tests. > If you can increase the level of detail, either here, > or in an off-line message, I can take a deeper look. Thanks! I will probably stumble forward with trying to eliminate the extraneous signals as well as I can, and turn it loose on the world (or vice versa). A more careful analysis will take some time, but I will keep it in mind. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill Schwab wrote:
> It is helping as I work toward it. Ironically, the concern is partly > due to the changes made to date. Basically, I start these machines and > walk away. They run for weeks at a time - 49 days and change - until > Delay breaks, which is due to a 32-bit millisecond clock overflowing. (2 raisedTo: 32) / (49 * 24 * 60 * 60 * 1000.0) = 1.0145 which is why VW went to a 64-bit microsecond clock... -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd |
In reply to this post by Schwab,Wilhelm K
Bill,
> I have added some logging, and the fun part is > that to make the logs intelligible, I had to add some critical sections. > It seems almost guaranteed that the logging will affect the behavior, > if only a small amount. The logging is optional, and I can direct it to > various streams, including Trace. I can't help with your design problem with semaphores, but one way of avoiding the Heisenberg effect might be to use Process-specific log streams. There's a "spare" slot in the definition of Process, called "tls" (I can't /guess/ what that might stand for ;-) which you could hijack for your purposes (I've done so before...). The log streams would necessarily go to file(s), but if you timestamp the data you should be able to reassemble the sequence of events mechanically. -- chris |
In reply to this post by Eliot Miranda
Eliot Miranda wrote:
> which is why VW went to a 64-bit microsecond clock... That's merely putting the problem off for a few hundred million years. Bill wants his machines to be /reliable/... -- chris |
Chris Uppal wrote:
> Eliot Miranda wrote: > > >>which is why VW went to a 64-bit microsecond clock... > > > That's merely putting the problem off for a few hundred million years. Bill > wants his machines to be /reliable/... :) But you're a little optimistic... (2 raisedTo: 64) / (365.5 * 24 * 3600 * 1000000.0) - (2005 - 1901) = 584038.0 :) -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd |
In reply to this post by Chris Uppal-3
Chris,
>>which is why VW went to a 64-bit microsecond clock... > > That's merely putting the problem off for a few hundred million years. Bill > wants his machines to be /reliable/... I certainly understand OA's caution; a rush to fix it could be disasterous. However, it is somewhat unusual to have somebody other than Microsoft as a limiting factor in reliability. Here is a thought: machines seem to know the date and time they started, even beyond the 49 day cutoff. Is there some way we could tap that info and create a long integer time stamp that is accurate enough to make delays work? Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Chris Uppal-3
Chris,
> I can't help with your design problem with semaphores, but one way of avoiding > the Heisenberg effect might be to use Process-specific log streams. There's a > "spare" slot in the definition of Process, called "tls" (I can't /guess/ what > that might stand for ;-) which you could hijack for your purposes (I've done so > before...). The log streams would necessarily go to file(s), but if you > timestamp the data you should be able to reassemble the sequence of events > mechanically. That is a great idea. Actually, I think I could collect data into write streams over strings, and then output the whole mess after the fact based on time stamps. I think the problem is resolving itself (one of the semaphores is probably not necessary!), but I will keep this in mind for this or future problems. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Chris Uppal-3
Chris Uppal wrote:
> > I can't help with your design problem with semaphores, but one way of avoiding > the Heisenberg effect might be to use Process-specific log streams. There's a > "spare" slot in the definition of Process, called "tls" (I can't /guess/ what > that might stand for ;-) which you could hijack for your purposes (I've done so > before...). The log streams would necessarily go to file(s), but if you > timestamp the data you should be able to reassemble the sequence of events > mechanically. > TLS usually stands for thread-local storage, which seems appropriate in this context. Randy -- Randy Coulman NOTE: Reply-to: address is spam-guarded. Reassemble the following to reply directly: rcoulman at charter dot net |
In reply to this post by Schwab,Wilhelm K
Chris,
> I think the problem is resolving itself (one of the semaphores is > probably not necessary!), but I will keep this in mind for this or > future problems. So much for that theory - I found the reason for the "extra" semaphore. However, I now better understand why it was useful, and can probably arrive at a much simpler design that is sufficiently flexible. Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
In reply to this post by Eliot Miranda
Eliot Miranda wrote:
> > > which is why VW went to a 64-bit microsecond clock... > > > > > > That's merely putting the problem off for a few hundred million years. > > Bill wants his machines to be /reliable/... > > :) > > But you're a little optimistic... > > (2 raisedTo: 64) / (365.5 * 24 * 3600 * 1000000.0) - (2005 - 1901) > = 584038.0 Ah, you said /micro/second clock, I missed that bit. Dolphin does have a microsecond clock (only if the hardware's available -- or so MSDN warns), but that's not used by the Delay class :-( -- chris |
In reply to this post by Schwab,Wilhelm K
Bill,
> Is there some way we could tap that info > and create a long integer time stamp that is accurate enough to make > delays work? If I'm reading the code for Delay correctly, then the effect of a clock wraparound is to make the Delay wake up early. If so, then I imagine that it would be easy (conceptually) to create a LongDelay which knew when (date and time) it should wake up, and kept doing: (Delay forSeconds: <whatever>) wait until the desired time had arrived (or passed). It would simply continue the loop if a delay expired too soon. One unwanted complication is that it would probably have to handle daylight saving changes. At least that would be true for the general case -- I don't know whether you need that feature. I started to hack something together, but was reminded again of what a dreadful state Dolphin's time and date classes are in (none of Date, Time, TimeStamp, SYSTEMTIME, FILETIME, and LocalFILETIME work together properly, and between the lot of them don't seem to provide a complete feature-set), and gave it up as No Fun At All. Still not difficult, though, just messy... -- chris |
Chris,
> If I'm reading the code for Delay correctly, then the effect of a clock > wraparound is to make the Delay wake up early. I respectfully doubt that is the case, only because it seems inconsistent with the behavior of the affected systems. Their mission is to wake up every 30 seconds or so, and they fail to wake up after the wrap. Of course, sometimes they did wake up (which made the bug even harder to find) if they managed to not create a critical delay that spanned the wrap. Technical babbling aside, it seems that if the delays did wake up early, then the machines would simply fire off another 30 second work cycle a little earlier than expected, and my pager wouldn't go off. Of course now that I understand the bug, I reboot before it happens - most of the time ~:0 The hard part is explaining to somebody that they suffered because I couldn't count to seven :( > If so, then I imagine that it > would be easy (conceptually) to create a LongDelay which knew when (date and > time) it should wake up, and kept doing: > > (Delay forSeconds: <whatever>) wait > > until the desired time had arrived (or passed). It would simply continue the > loop if a delay expired too soon. If I understood correctly, somebody recently did something similar for Squeak. > One unwanted complication is that it would probably have to handle daylight > saving changes. At least that would be true for the general case -- I don't > know whether you need that feature. For the most heavily affected systems, I do not allow them to make the change w/o "permission". We pick one time zone and stick with it for the duration of surgery. I have some servers for which time is less critical, and I do allow them to change on their own; I could easily treat them the same way. > I started to hack something together, but was reminded again of what a dreadful > state Dolphin's time and date classes are in (none of Date, Time, TimeStamp, > SYSTEMTIME, FILETIME, and LocalFILETIME work together properly, and between the > lot of them don't seem to provide a complete feature-set), and gave it up as > No Fun At All. Still not difficult, though, just messy... Odd you mention this now. Just days ago, I created a #fromMilliseconds: for TimeStamp (or something like that - too lazy to check right now). It seemed to work but needs some tests. Anybody interested in it? Have a good one, Bill -- Wilhelm K. Schwab, Ph.D. [hidden email] |
Bill,
> > If I'm reading the code for Delay correctly, then the effect of a clock > > wraparound is to make the Delay wake up early. > > I respectfully doubt that is the case, only because it seems > inconsistent with the behavior of the affected systems. Ah, sorry. I was only guessing at what would happen if ProcessorScheduler>>signal:afterMilliseconds: were passed a negative value. Looking at it, it seems that it signals the semaphore immediately if the milliseconds count is a /small/ negative number, but that the primitive fails with a walkback if the delay is something like, (2**31) negated as it would be (I think) when the clock wraps. I don't know, but maybe for your purposes as simple a fix as changing Delay>>getDuration so that it never returns less than 0 (or less than -1) would avoid the problem ? That should (I think) make wrapped delays expire immediately which you said wouldn't cause you any particular problems (and is reasonably easy to code around if necessary). Difficult to test, though... > Odd you mention this now. Just days ago, I created a #fromMilliseconds: > for TimeStamp (or something like that - too lazy to check right now). > It seemed to work but needs some tests. Anybody interested in it? I've pretty much settled on using FILETIME for everything these days. BTW, for anyone interested, don't forget that Ian's goodies include an implementation of the ANSI DateAndTime stuff. I haven't used it myself (I'd forgotten all about it, to he honest) but it looks good. -- chris |
Free forum by Nabble | Edit this page |