On Sat, Feb 4, 2017 at 7:09 PM, Dan Norton <[hidden email]> wrote: Hi Folks, Freezes on cog_macos64x64_squeak.cog.spur_201702021058 too. Maybe try the 64 bit windows VM? - Bert - |
In reply to this post by Levente Uzonyi
On 02/05/2017 06:31 PM, Levente Uzonyi wrote: > > At this point, I would simply install a REPL and would try to connect > to the image after the morphic freeze to see what happened. > I'm still not convinced about this being a VM bug. It might as well be > an image side bug, which simply doesn't happen on windows due to VM > differences. > In looking for REPL (it's in the Cuis base as Transcripter > readEvalPrint) I found a new way to get a hang-up. In a workspace do: (Transcripter newInFrame: (0@0 extent: 100@200)) nextPutAll: 'Hello there'; endEntry; newLine; print: 355.0/113; endEntry; readEvalPrint. ...and the clock stops. 'top' reports 100 %CPU for squeak. - Dan |
On Mon, Feb 6, 2017 at 7:10 PM, Dan Norton <[hidden email]> wrote:
That is normal behavior - the REPL is not Morphic. Just type "quit" when you're done. - Bert - |
On 02/06/2017 01:16 PM, Bert
Freudenberg wrote:
OK, I'll try to apply it. Meanwhile the attached screen shot shows a hang-up which occurred as I tried to open a browser. The only things done at this point were to open a workspace, pin the Windows menu, load the Freeze package, and then try to open a browser. - Dan Screenshot - 02062017 - 02:47:46 PM.png.bz2 (432K) Download Attachment |
In reply to this post by Bert Freudenberg
Thanks Bert. So the problem is not Linux specific, but indeed Spur64 and Cuis specific. Unfortunately (well, actually fortunately!) I'm on vacations and I don't have any x64 machine available so I can't do any testing right now. I'll be back home in about a week. Thanks,
-- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich |
In reply to this post by David T. Lewis
On 06/02/2017 11:11 a.m., David T. Lewis wrote: > > On Mon, Feb 06, 2017 at 01:19:48PM +0100, Bert Freudenberg wrote: >> >> On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis<[hidden email]> wrote: >> >>> It would be interesting to try reproducing this in a Squeak image, but I >>> have not quite figured out how to make the Tokenish example work in Squeak. >>> >> See attachment. Did not freeze on Mac with 10 Tokenish jumping. >> - Bert - > I am running it now on Ubuntu 64-bit Spur Squeak, with the same VM that I > used with Cuis. I have been running a dozen Tokenish morphs for about an > hour, so far no freeze. The morphs are bouncing much more slowly on Squeak > than on Cuis, so I think that the test is loading Cuis much more heavily. > > I cannot look further now, but I'll take another look in about 12 hours and > see if I can speed up the Squeak animation and possibly freeze the image. > > Dave Thanks Dave. This is useful information. Cheers, -- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich |
In reply to this post by Bert Freudenberg
On 06/02/2017 01:17 p.m., Bert Freudenberg wrote:
Yes, we'd try that. I'm out of town, and will not be able to do it this week. If anyone wants to try, when I tried Spur64 in Windows the image would crash. The workaround is to clear the World background (an ivar in WorldState). There seems to be some problem with jpeg decompression on that VM. Thanks, -- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich |
2017-02-07 1:42 GMT+01:00 Juan Vuletich <[hidden email]>:
Yes, jpeg is broken in Win64, but works fine in MacOSX 64 and Linux 64 with same sources. I bet that it's related to LLP64... I tired to update to latest available code for this jpeg library, but it does not fix anything (and would introduce controversial deviations to the standard). We could comment out the generation of jpeg plugin in the interim
|
2017-02-07 11:09 GMT+01:00 Nicolas Cellier <[hidden email]>:
Follow-up: jpeg should be fixed for Win64 in latest builds on the server. It was not related to LLP64 but rather a problem of jump_buf alignment. https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/119
|
In reply to this post by Bert Freudenberg
Hi Folks, Today, I was able to get closer to the origin of the problem. The problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1 or two milliseconds. (Usually they are quite larger, and the problem does not appear). With the 64-bit VM, sometimes very short delays hang the complete system. I don't know why, but this: waitDelay _ Delay forMilliseconds: 50. [ true ] whileTrue: [ 1000 atRandom print. waitDelay setDelay: 1; wait ]. is enough to hang Cuis in a short time. When trying to reproduce the problem in Squeak I had mixed results. It doesn't seem to hang there, but if I let this run for a couple of minutes and then try to halt with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As a workaround, in Cuis in the inter cycle pause, I'm avoiding very short delays. I'm not sure if the bug is in the VM, in the image, or both, but it is still there, and it also affects Squeak. I haven't tried it, but Pharo might also have the problem (if not, it can show a possible solution). BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ... Thanks, On 2/6/2017 12:28 PM, Bert Freudenberg wrote:
-- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich |
I finally understood what was happening. The problem with is primitive 136: primSignal: aSemaphore atMilliseconds: aSmallInteger "Signal the semaphore when the millisecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive." <primitive: 136> ^self primitiveFailed If the requested tick is already in the past, all the VMs except for Spur-64 will signal the semaphore anyway. Spur-64 won't. Taking care of this in the Cuis image was easy enough. Squeak seems not to be affected because it uses a newer primitive, that takes the large integer microsecond ticker, that might have the older behavior (signaling anyway). I'm not really sure, and maybe someone could check the primitive behavior and the corresponding assumptions in Squeak. Older Squeak images are not affected, as they are all 32-bit. So maybe there's no action required. In any case, making the behavior consistent across VMs and in both primitives would be nice. Cheers, On 3/3/2017 3:55 PM, Juan Vuletich wrote:
-- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich |
In the VM, the millisecond clock wraps within the 32 bit integer range: #define MillisecondClockMask 0x1FFFFFFF In the Cuis image, Delay class>>handleTimerEvent does this: nextTick := nextTick min: SmallInteger maxVal. On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on a 32-bit image it is 16r3FFFFFFF. Could that be it? I don't really know how to test in Squeak. As you say, Squeak is now using the microsecond clock in #handleTimerEvent. I do not see anything in primitiveSignalAtMilliseconds that would behave any differently on a 64 bit versus 32 bit image or VM, but I do not know how to test to be sure. Dave On Mon, Mar 06, 2017 at 11:06:22AM -0300, Juan Vuletich wrote: > > I finally understood what was happening. The problem with is primitive 136: > > primSignal: aSemaphore atMilliseconds: aSmallInteger > "Signal the semaphore when the millisecond clock reaches the value > of the second argument. Fail if the first argument is neither a > Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive." > <primitive: 136> > ^self primitiveFailed > > If the requested tick is already in the past, all the VMs except for > Spur-64 will signal the semaphore anyway. Spur-64 won't. Taking care of > this in the Cuis image was easy enough. > > Squeak seems not to be affected because it uses a newer primitive, that > takes the large integer microsecond ticker, that might have the older > behavior (signaling anyway). I'm not really sure, and maybe someone > could check the primitive behavior and the corresponding assumptions in > Squeak. > > Older Squeak images are not affected, as they are all 32-bit. So maybe > there's no action required. In any case, making the behavior consistent > across VMs and in both primitives would be nice. > > Cheers, > > On 3/3/2017 3:55 PM, Juan Vuletich wrote: > > > > > > > >Hi Folks, > > > >Today, I was able to get closer to the origin of the problem. The > >problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1 > >or two milliseconds. (Usually they are quite larger, and the problem > >does not appear). With the 64-bit VM, sometimes very short delays hang > >the complete system. I don't know why, but this: > > > >waitDelay _ Delay forMilliseconds: 50. > >[ true ] whileTrue: [ > > 1000 atRandom print. > > waitDelay setDelay: 1; wait ]. > > > >is enough to hang Cuis in a short time. When trying to reproduce the > >problem in Squeak I had mixed results. It doesn't seem to hang there, > >but if I let this run for a couple of minutes and then try to halt > >with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As > >a workaround, in Cuis in the inter cycle pause, I'm avoiding very > >short delays. > > > >I'm not sure if the bug is in the VM, in the image, or both, but it > >is still there, and it also affects Squeak. I haven't tried it, but > >Pharo might also have the problem (if not, it can show a possible > >solution). > > > >BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ... > > > >Thanks, > > > >On 2/6/2017 12:28 PM, Bert Freudenberg wrote: > >> > >> > >> > >>Yes, I tried with Spur64. Have not seen it freeze yet on Mac in > >>Squeak, but we possibly have not replicated the test well enough yet. > >> > >>It *did* freeze in Cuis Spur64 on Mac too. > >> > >>- Bert - > >> > >>On Mon, Feb 6, 2017 at 2:08 PM, Juan Vuletich <[hidden email] > >><mailto:[hidden email]>> wrote: > >> > >> > >> Were you usig Spur64? On Debian it does not freeze with CogV3 or > >> Spur32. Only with Spur64. > >> > >> Thanks, > >> Juan > >> > >> Sent using Zoho Mail <https://www.zoho.com/mail/> > >> > >> > >> ---- On Mon, 06 Feb 2017 09:19:48 -0300 *[hidden email] > >> <mailto:[hidden email]> * wrote ---- > >> > >> On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis > >> <[hidden email] <mailto:[hidden email]>> wrote: > >> > >> It would be interesting to try reproducing this in a > >> Squeak image, but I > >> have not quite figured out how to make the Tokenish > >> example work in Squeak. > >> > >> > >> See attachment. Did not freeze on Mac with 10 Tokenish jumping. > >> - Bert - > >> > >> > >> > > > > > >-- > >Juan Vuletich > >www.cuis-smalltalk.org > >https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev > >@JuanVuletich > > > -- > Juan Vuletich > www.cuis-smalltalk.org > https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev > @JuanVuletich > |
Hi Dave, Thanks for answering. Inline. On 3/6/2017 10:50 PM, David T. Lewis wrote: > > In the VM, the millisecond clock wraps within the 32 bit integer range: > > #define MillisecondClockMask 0x1FFFFFFF > > In the Cuis image, Delay class>>handleTimerEvent does this: > > nextTick := nextTick min: SmallInteger maxVal. > > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on > a 32-bit image it is 16r3FFFFFFF. > > Could that be it? I wasn't aware of that, and had assumed that millisecond timer would use the whole SmallInteger range. This might introduce a bug, that would only appear at timer rollover, i.e. about 6 days after image startup. I'll fix this. Thanks. But this is a completely separated issue. The problem we saw, the semaphore never being signaled if deadline is in the past, happens immediately after image startup. > I don't really know how to test in Squeak. As you say, Squeak is now > using the microsecond clock in #handleTimerEvent. I do not see anything > in primitiveSignalAtMilliseconds that would behave any differently on > a 64 bit versus 32 bit image or VM, but I do not know how to test to > be sure. > > Dave Well, what follows is a way to test VM behavior. Tested in Cuis, but should be trivial to reproduce in Squeak, as it is a VM issue. I Cuis add (copied from Squeak): !Time class methodsFor: 'general inquiries' stamp: 'jmv 3/7/2017 08:58:12'! utcMicrosecondClock "Answer the UTC microseconds since the Smalltalk epoch (January 1st 1901, the start of the 20th century). The value is derived from the Posix epoch with a constant offset corresponding to elapsed microseconds between the two epochs according to RFC 868." <primitive: 240> ^0! ! !Delay class methodsFor: 'primitives' stamp: 'jmv 3/7/2017 08:57:45'! primSignal: aSemaphore atUTCMicroseconds: anInteger "Signal the semaphore when the UTC microsecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil, or if the second argument is not an integer. Essential. See Object documentation whatIsAPrimitive." <primitive: 242> ^self primitiveFailed! ! Then, in a Workspace, try the following 4 doits: s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10. s wait. 'Ok' print. s _ Semaphore new. Delay primSignal: s atMilliseconds: Time millisecondClockValue + 10. s wait. 'Ok' print. s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10. s wait. 'Ok' print. s _ Semaphore new. Delay primSignal: s atMilliseconds: Time millisecondClockValue - 10. s wait. 'Not OK at all' print. On Spur32, all 4 finish immediately. On Spur64, the first 3 also finish immediately, but the fourth freezes the image. The difference in behavior between Spur32 and Spur64 (on Linux) is indeed there. Ok. Also tried Squeak (note that instead of #millisecondClockValue in Squeak it is #primMillisecondClock) : s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10. s wait. 'Ok'. s _ Semaphore new. Delay primSignal: s atMilliseconds: Time primMillisecondClock + 10. s wait. 'Ok'. s _ Semaphore new. Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10. s wait. 'Ok'. s _ Semaphore new. Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. s wait. 'Not OK at all'. Exactly the same behavior. I just took a look at static void primitiveSignalAtMilliseconds(void) in https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c The only thing I see is that msecs is an usqInt and deltaMsecs is an sqInt. But I'm not good enough at gcc subtleties to say if this matters at all. I mean, it looks as if 'if (deltaMsecs < 0) {' was true on Spur64 and false on Spur32... Or maybe the difference is in the handling of nextWakeupUsecs ... In any case, it looks like deadlines in the past are not supported (as code assumes they are because of rollover...) Thanks, -- Juan Vuletich www.cuis-smalltalk.org https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev @JuanVuletich |
Hi Juan, I confirm your Cuis test results on Squeak using several image/VM combinations, details below. On Tue, Mar 07, 2017 at 09:49:53AM -0300, Juan Vuletich wrote: > Hi Dave, > > Thanks for answering. Inline. > > On 3/6/2017 10:50 PM, David T. Lewis wrote: > > > >In the VM, the millisecond clock wraps within the 32 bit integer range: > > > > #define MillisecondClockMask 0x1FFFFFFF > > > >In the Cuis image, Delay class>>handleTimerEvent does this: > > > > nextTick := nextTick min: SmallInteger maxVal. > > > >On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on > >a 32-bit image it is 16r3FFFFFFF. > > > >Could that be it? > > I wasn't aware of that, and had assumed that millisecond timer would use > the whole SmallInteger range. This might introduce a bug, that would > only appear at timer rollover, i.e. about 6 days after image startup. > I'll fix this. Thanks. > > But this is a completely separated issue. The problem we saw, the > semaphore never being signaled if deadline is in the past, happens > immediately after image startup. > > >I don't really know how to test in Squeak. As you say, Squeak is now > >using the microsecond clock in #handleTimerEvent. I do not see anything > >in primitiveSignalAtMilliseconds that would behave any differently on > >a 64 bit versus 32 bit image or VM, but I do not know how to test to > >be sure. > > > >Dave > > Well, what follows is a way to test VM behavior. Tested in Cuis, but > should be trivial to reproduce in Squeak, as it is a VM issue. I Cuis > add (copied from Squeak): I tried this with 4 different Squeak image/VM combinations: - Squeak 3.8 with interpreter VM (an older image that uses millisecond clock for Delay) - Squeak trunk V3 image with interpreter VM (latest version image, but non-Spur, updated via www.squeaksource.com/TrunkUpdateStreamV3) - Squeak trunk 32 bit Spur - Squeak trunk 64 bit Spur > > !Time class methodsFor: 'general inquiries' stamp: 'jmv 3/7/2017 08:58:12'! > utcMicrosecondClock > "Answer the UTC microseconds since the Smalltalk epoch (January 1st > 1901, the start of the 20th century). > The value is derived from the Posix epoch with a constant offset > corresponding to elapsed microseconds > between the two epochs according to RFC 868." > <primitive: 240> > ^0! ! > > !Delay class methodsFor: 'primitives' stamp: 'jmv 3/7/2017 08:57:45'! > primSignal: aSemaphore atUTCMicroseconds: anInteger > "Signal the semaphore when the UTC microsecond clock reaches the > value of the second argument. > Fail if the first argument is neither a Semaphore nor nil, or if > the second argument is not an integer. > Essential. See Object documentation whatIsAPrimitive." > <primitive: 242> > ^self primitiveFailed! ! I tried adding these to my Squeak 3.8 image for the test. It does not work properly because the primitive table was different back then, and the interpreter VM is automatically adjusting for this so not calling primitive 240 (actually it calls the old #primitiveSerialPortWrite rather than #primitiveUTCMicrosecondClock that later replaced it). Nevertheless, the primSignal:atMilliseconds: works, and there is no problem with a -10 parameter, so these are included marked in the results below. I also note that I locked up the Squeak 3.8 image a couple of times while running various tests with bad input parameters. It is not reproduceable, but there may be something bad about calling #primSignal:atMilliseconds: in an image that is also using it for the Delay mechanism. I also locked up a Spur 32 image when calling primSignal:atUTCMicroseconds: so this may be the same problem, it may not be safe to call this when the same method is being used for Delay handling. > > Then, in a Workspace, try the following 4 doits: > > s _ Semaphore new. > Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10. > s wait. > 'Ok' print. > > s _ Semaphore new. > Delay primSignal: s atMilliseconds: Time millisecondClockValue + 10. > s wait. > 'Ok' print. > > s _ Semaphore new. > Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10. > s wait. > 'Ok' print. > > s _ Semaphore new. > Delay primSignal: s atMilliseconds: Time millisecondClockValue - 10. > s wait. > 'Not OK at all' print. > > On Spur32, all 4 finish immediately. On Spur64, the first 3 also finish > immediately, but the fourth freezes the image. The difference in > behavior between Spur32 and Spur64 (on Linux) is indeed there. > > > Ok. Also tried Squeak (note that instead of #millisecondClockValue in > Squeak it is #primMillisecondClock) : Test results for my four Squeak image/VM combinations are added below. > > s _ Semaphore new. > Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10. > s wait. > 'Ok'. Squeak 3.8 => OK Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => OK > > s _ Semaphore new. > Delay primSignal: s atMilliseconds: Time primMillisecondClock + 10. > s wait. > 'Ok'. Squeak 3.8 => OK Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => OK > > s _ Semaphore new. > Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10. > s wait. > 'Ok'. Squeak 3.8 => primitive failed (but see note above for Squeak 3.8 using different primitive table) Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => OK > > s _ Semaphore new. > Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. > s wait. > 'Not OK at all'. Squeak 3.8 => OK Squeak trunk V3 interpreter => OK Squeak trunk Spur 32 => OK Squeak trunk Spur 64 => Not OK at all, hangs image > > Exactly the same behavior. Confirmed. > > I just took a look at > static void primitiveSignalAtMilliseconds(void) > in > https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c > The only thing I see is that msecs is an usqInt and deltaMsecs is an > sqInt. But I'm not good enough at gcc subtleties to say if this matters > at all. I mean, it looks as if 'if (deltaMsecs < 0) {' was true on > Spur64 and false on Spur32... Or maybe the difference is in the handling > of nextWakeupUsecs ... I see that ioMSecs() is declared as signed long (32 bits), but it is used in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds should be declared as 32 bit long and unsigned long to match the actual usage. Unfortunately I cannot easily recompile to verify (build problems on my Ubuntu for Cog/Spur, sorry), but maybe someone else can take a look at this? > > In any case, it looks like deadlines in the past are not supported (as > code assumes they are because of rollover...) I agree this looks like a bug in the 64 bit VMs. But I do not yet see the reason for it. Dave |
In reply to this post by Juan Vuletich-3
Hi Juan, Hi All,
On Mon, Mar 6, 2017 at 6:06 AM, Juan Vuletich <[hidden email]> wrote:
Yes, but primitive 136 *should not be used* :-). You should use primitive 242: primSignal: aSemaphore atUTCMicroseconds: anInteger "Signal the semaphore when the UTC microsecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil, or if the second argument is not an integer. Essential. See Object documentation whatIsAPrimitive." <primitive: 242> ^self primitiveFailed The use of the UTC microsecond count eliminates any and all wrapping issues for about 50,000 years [puhleaze Mr Pruitt can you try and do your bit to make it possible that we'll make it that far. hic. ed.]. The old primitive 136 primSignal:atMilliseconds: is subject to wrapping problems. I see that primSignal:atMilliseconds: is still used in Squeak 5.x but only to turn off signaling: Delay class>>shutDown "Suspend the active delay, if any, before snapshotting. It will be reactived when the snapshot is resumed." "Details: This prevents a timer interrupt from waking up the active delay in the midst snapshoting, since the active delay will be restarted when resuming the snapshot and we don't want to process the delay twice." AccessProtect wait. self primSignal: nil atMilliseconds: 0. self saveResumptionTimes. DelaySuspended := true. Delay class>>stopTimerInterruptWatcher "Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten." "Delay startTimerInterruptWatcher" self primSignal: nil atMilliseconds: 0. TimingSemaphore ifNotNil:[TimingSemaphore terminateProcess]. I will change this now. HTH
_,,,^..^,,,_ best, Eliot |
In reply to this post by David T. Lewis
Hi David,
On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> wrote:
I bet that the following code from primitiveSignalAtMilliseconds ends up wrapping when given Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). deltaMsecs < 0 ifTrue: [deltaMsecs := deltaMsecs + MillisecondClockMask + 1]. nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000). and I bet you'll find that the VM will wake up in about 6 days and 5 hours ;-) I suppose we could fix this, but I'm *much* happier to simply not use primitiveSignalAtMilliseconds and stay with the simpler and wrapping-immune primitiveSignalAtUTCMicroseconds
_,,,^..^,,,_ best, Eliot |
On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote: > > Hi David, > > On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> wrote: > > > > > In the VM, the millisecond clock wraps within the 32 bit integer range: > > > > #define MillisecondClockMask 0x1FFFFFFF > > > > In the Cuis image, Delay class>>handleTimerEvent does this: > > > > nextTick := nextTick min: SmallInteger maxVal. > > > > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on > > a 32-bit image it is 16r3FFFFFFF. > > > > Could that be it? > > > > I don't really know how to test in Squeak. As you say, Squeak is now > > using the microsecond clock in #handleTimerEvent. I do not see anything > > in primitiveSignalAtMilliseconds that would behave any differently on > > a 64 bit versus 32 bit image or VM, but I do not know how to test to > > be sure. > > > > I bet that the following code from primitiveSignalAtMilliseconds ends up > wrapping when given > Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. > > deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). > deltaMsecs < 0 ifTrue: > [deltaMsecs := deltaMsecs + MillisecondClockMask + 1]. > nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000). > > and I bet you'll find that the VM will wake up in about 6 days and 5 hours > ;-) > No. The failure is specific to the 64 bit VM. Source code for the primitive is the same in either case. > > I suppose we could fix this, but I'm *much* happier to simply not use > primitiveSignalAtMilliseconds and stay with the simpler and wrapping-immune > primitiveSignalAtUTCMicroseconds Fair enough, but given that there is a demonstrated bug that affects only the 64-bit VM, and given that it expresses itself intermittently and in ways that affect only someone who is attempting to migrate their existing V3 image to Spur, then I would say that it makes very good sense to take the time to fix it if we are able to do so. After all, there may be other people who will want to migrate V3 images to Spur, and there is no point in making the process needlessly difficult. I do not have the solution, but maybe someone else can help. So I am asking for help here. Can someone with a working 64-bit build environment please check and see if what I said in an earlier email might make a difference: I see that ioMSecs() is declared as signed long (32 bits), but it is used in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds should be declared as 32 bit long and unsigned long to match the actual usage. Thanks, Dave |
2017-03-10 3:47 GMT+01:00 David T. Lewis <[hidden email]>:
> Hi David I just reviewed the generated code for primitiveSignalAtMilliseconds andhere is what i found: The primitive will simply fail if fed with a negative integer, thanks to: msecs = positive32BitValueOf(msecsObj); msecs is allways positive, since declared as usqInt msecs; So far, so good. Then deltaMsecs = msecs - ((ioMSecs()) & MillisecondClockMask); if (deltaMsecs < 0) { deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1; } GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000); Due to the 29 bits Mask, the bitAnd: (&) operation will result into a positive int. 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF However, when we do the Roll-over thing: deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1; Then we create a giant delay (the 6 days mentionned by Eliot) which is HIGHLY QUESTIONNABLE whatever 32 or 64 bits! The difference between 32 and 64 bits is in the next line: GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000); Because sqInt deltaMsecs; is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs * 1000) will do two different things: - in 32 bits it will overflow leading to a negative delay and a nextWakeupUsecs in the past (but that's invoking Undefined Behavior) - in 64 bits it won't overflow leading to he 6 days delay Does that explain? if (deltaMsecs < 0) I would either fail the primitive or set the deltaMsecs to zero. What do you think? Nicolas |
2017-03-10 9:45 GMT+01:00 Nicolas Cellier <[hidden email]>:
then (clock+delay) won't overflow (thanks to largeIntegers). Eventually, for delay > 2^32-2^29 msecs, the primitive will fail thru (positive32BitValueOf). That's something like 43 days delay max, more than enough... deltaMsecs = (msecs - ioMSecs()) & MillisecondClockMask; or we should leave deltaMsecs as is and reject negative delays Also note this in 32 bits: Since *1000 consumes 10bits, then delays above 21 bits msecs will overflow and be signalled immediately. Tha's about 2e6 msecs, 2e3 secs. Thus the maximum delay supported by this primitive is a bit less than 36 minutes (16r80000000/1000/1000/60). So maybe we should also reject large positive "unreasonable" delays, that isanthing above a few minutes and document it. |
In reply to this post by Nicolas Cellier
On Fri, Mar 10, 2017 at 09:45:07AM +0100, Nicolas Cellier wrote: > > 2017-03-10 3:47 GMT+01:00 David T. Lewis <[hidden email]>: > > > > > On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote: > > > > > > Hi David, > > > > > > On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> > > wrote: > > > > > > > > > > > In the VM, the millisecond clock wraps within the 32 bit integer range: > > > > > > > > #define MillisecondClockMask 0x1FFFFFFF > > > > > > > > In the Cuis image, Delay class>>handleTimerEvent does this: > > > > > > > > nextTick := nextTick min: SmallInteger maxVal. > > > > > > > > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but > > on > > > > a 32-bit image it is 16r3FFFFFFF. > > > > > > > > Could that be it? > > > > > > > > I don't really know how to test in Squeak. As you say, Squeak is now > > > > using the microsecond clock in #handleTimerEvent. I do not see anything > > > > in primitiveSignalAtMilliseconds that would behave any differently on > > > > a 64 bit versus 32 bit image or VM, but I do not know how to test to > > > > be sure. > > > > > > > > > > I bet that the following code from primitiveSignalAtMilliseconds ends up > > > wrapping when given > > > Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. > > > > > > deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask). > > > deltaMsecs < 0 ifTrue: > > > [deltaMsecs := deltaMsecs + MillisecondClockMask + 1]. > > > nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000). > > > > > > and I bet you'll find that the VM will wake up in about 6 days and 5 > > hours > > > ;-) > > > > > > > No. The failure is specific to the 64 bit VM. Source code for the primitive > > is the same in either case. > > > > > > > > > I suppose we could fix this, but I'm *much* happier to simply not use > > > primitiveSignalAtMilliseconds and stay with the simpler and > > wrapping-immune > > > primitiveSignalAtUTCMicroseconds > > > > Fair enough, but given that there is a demonstrated bug that affects only > > the 64-bit VM, and given that it expresses itself intermittently and in > > ways > > that affect only someone who is attempting to migrate their existing V3 > > image > > to Spur, then I would say that it makes very good sense to take the time > > to fix it if we are able to do so. After all, there may be other people who > > will want to migrate V3 images to Spur, and there is no point in making the > > process needlessly difficult. > > > > I do not have the solution, but maybe someone else can help. So I am asking > > for help here. Can someone with a working 64-bit build environment please > > check and see if what I said in an earlier email might make a difference: > > > > I see that ioMSecs() is declared as signed long (32 bits), but it is used > > in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe > > the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds > > should be declared as 32 bit long and unsigned long to match the actual > > usage. > > > > Thanks, > > > > Dave > > > > > > Hi David > I just reviewed the generated code for primitiveSignalAtMilliseconds > andhere is what i found: > > The primitive will simply fail if fed with a negative integer, thanks to: > > msecs = positive32BitValueOf(msecsObj); > > msecs is allways positive, since declared as > usqInt msecs; > > So far, so good. > > Then > deltaMsecs = msecs - ((ioMSecs()) & > MillisecondClockMask); > if (deltaMsecs < 0) { > deltaMsecs = (deltaMsecs + > MillisecondClockMask) + 1; > } > GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + > (deltaMsecs * 1000); > > Due to the 29 bits Mask, the bitAnd: (&) operation will result into a > positive int. > 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF > > However, when we do the Roll-over thing: > deltaMsecs = (deltaMsecs + > MillisecondClockMask) + 1; > Then we create a giant delay (the 6 days mentionned by Eliot) which is > HIGHLY QUESTIONNABLE whatever 32 or 64 bits! > > The difference between 32 and 64 bits is in the next line: > GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + > (deltaMsecs * 1000); > Because > sqInt deltaMsecs; > > is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs * > 1000) will do two different things: > - in 32 bits it will overflow leading to a negative delay and a > nextWakeupUsecs in the past (but that's invoking Undefined Behavior) > - in 64 bits it won't overflow leading to he 6 days delay > > Does that explain? > > if (deltaMsecs < 0) I would either fail the primitive or set the deltaMsecs > to zero. > What do you think? Hi Nicolas, Thanks very much for looking into this and for the explanation. I am away right now so I cannot look at it now, but it does seem likely to be a sign extension issue at least in part. If you have a chance to try recompiling with changes to the generated code, Juan's test condition is just this: s := Semaphore new. Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10. s wait. This locks the image on 64-bits, and not on 32-bits. Thanks a lot! Dave |
Free forum by Nabble | Edit this page |