Freeze after Morph Activity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
68 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Freeze after Morph Activity

Bert Freudenberg
 
On Sat, Feb 4, 2017 at 7:09 PM, Dan Norton <[hidden email]> wrote:
 Hi Folks,

On Debian8, using cog_linux64x64_squeak.cog.spur_201702021058 and Cuis5.0-3040-spur-64 updated to 3043, everything will stop, including the clock in the Cuis task bar. Neither mouse nor keyboard events are processed. Trying to interrupt with alt-. is unsuccessful. CogSpur was re-cloned and the test case was re-run with the same result.

On Windows 10, using cog_win32x86_squeak.cog.spur_201702021058, and Cuis 5.0 3043, the same test case runs for hours with no problem.

Freezes on cog_macos64x64_squeak.cog.spur_201702021058 too. Maybe try the 64 bit windows VM?

- Bert -
 
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph Activity

Dan Norton
In reply to this post by Levente Uzonyi
 


On 02/05/2017 06:31 PM, Levente Uzonyi wrote:
>
> At this point, I would simply install a REPL and would try to connect
> to the image after the morphic freeze to see what happened.
> I'm still not convinced about this being a VM bug. It might as well be
> an image side bug, which simply doesn't happen on windows due to VM
> differences.
>

In looking for REPL (it's in the Cuis base as Transcripter >
readEvalPrint) I found a new way to get a hang-up. In a workspace do:

(Transcripter newInFrame: (0@0 extent: 100@200))
     nextPutAll: 'Hello there'; endEntry;
     newLine; print: 355.0/113; endEntry;
     readEvalPrint.

...and the clock stops. 'top' reports 100 %CPU for squeak.

  - Dan

Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph Activity

Bert Freudenberg
 
On Mon, Feb 6, 2017 at 7:10 PM, Dan Norton <[hidden email]> wrote:

In looking for REPL (it's in the Cuis base as Transcripter > readEvalPrint) I found a new way to get a hang-up. In a workspace do:

(Transcripter newInFrame: (0@0 extent: 100@200))
    nextPutAll: 'Hello there'; endEntry;
    newLine; print: 355.0/113; endEntry;
    readEvalPrint.

...and the clock stops. 'top' reports 100 %CPU for squeak.

That is normal behavior - the REPL is not Morphic. Just type "quit" when you're done.

- Bert - 
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph Activity

Dan Norton
 



On 02/06/2017 01:16 PM, Bert Freudenberg wrote:
 


On Mon, Feb 6, 2017 at 7:10 PM, Dan Norton <[hidden email]> wrote:

In looking for REPL (it's in the Cuis base as Transcripter > readEvalPrint) I found a new way to get a hang-up. In a workspace do:

(Transcripter newInFrame: (0@0 extent: 100@200))
    nextPutAll: 'Hello there'; endEntry;
    newLine; print: 355.0/113; endEntry;
    readEvalPrint.

...and the clock stops. 'top' reports 100 %CPU for squeak.

That is normal behavior - the REPL is not Morphic. Just type "quit" when you're done.


OK, I'll try to apply it. Meanwhile the attached screen shot shows a hang-up which occurred as I tried to open a browser. The only things done at this point were to open a workspace, pin the Windows menu, load the Freeze package, and then try to open a browser.

 - Dan


Screenshot - 02062017 - 02:47:46 PM.png.bz2 (432K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Juan Vuletich-3
In reply to this post by Bert Freudenberg
 
Thanks Bert. So the problem is not Linux specific, but indeed Spur64 and Cuis specific.

Unfortunately (well, actually fortunately!) I'm on vacations and I don't have any x64 machine available so I can't do any testing right now. I'll be back home in about a week.

Thanks,


Yes, I tried with Spur64. Have not seen it freeze yet on Mac in Squeak, but we possibly have not replicated the test well enough yet.

It *did* freeze in Cuis Spur64 on Mac too. 

- Bert -

-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph Activity

Juan Vuletich-3
In reply to this post by David T. Lewis
 
On 06/02/2017 11:11 a.m., David T. Lewis wrote:

>
> On Mon, Feb 06, 2017 at 01:19:48PM +0100, Bert Freudenberg wrote:
>>
>> On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis<[hidden email]>  wrote:
>>
>>> It would be interesting to try reproducing this in a Squeak image, but I
>>> have not quite figured out how to make the Tokenish example work in Squeak.
>>>
>> See attachment. Did not freeze on Mac with 10 Tokenish jumping.
>> - Bert -
> I am running it now on Ubuntu 64-bit Spur Squeak, with the same VM that I
> used with Cuis. I have been running a dozen Tokenish morphs for about an
> hour, so far no freeze. The morphs are bouncing much more slowly on Squeak
> than on Cuis, so I think that the test is loading Cuis much more heavily.
>
> I cannot look further now, but I'll take another look in about 12 hours and
> see if I can speed up the Squeak animation and possibly freeze the image.
>
> Dave

Thanks Dave. This is useful information.

Cheers,

--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich


Reply | Threaded
Open this post in threaded view
|

Re: Freeze after Morph Activity

Juan Vuletich-3
In reply to this post by Bert Freudenberg
 
On 06/02/2017 01:17 p.m., Bert Freudenberg wrote:
 


On Sat, Feb 4, 2017 at 7:09 PM, Dan Norton <[hidden email]> wrote:
 Hi Folks,

On Debian8, using cog_linux64x64_squeak.cog.spur_201702021058 and Cuis5.0-3040-spur-64 updated to 3043, everything will stop, including the clock in the Cuis task bar. Neither mouse nor keyboard events are processed. Trying to interrupt with alt-. is unsuccessful. CogSpur was re-cloned and the test case was re-run with the same result.

On Windows 10, using cog_win32x86_squeak.cog.spur_201702021058, and Cuis 5.0 3043, the same test case runs for hours with no problem.

Freezes on cog_macos64x64_squeak.cog.spur_201702021058 too. Maybe try the 64 bit windows VM?

- Bert -
 

Yes, we'd try that. I'm out of town, and will not be able to do it this week.

If anyone wants to try, when I tried Spur64 in Windows the image would crash. The workaround is to clear the World background (an ivar in WorldState). There seems to be some problem with jpeg decompression on that VM.

Thanks,
-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich
Reply | Threaded
Open this post in threaded view
|

Re: Freeze after Morph Activity

Nicolas Cellier
 


2017-02-07 1:42 GMT+01:00 Juan Vuletich <[hidden email]>:
 
On 06/02/2017 01:17 p.m., Bert Freudenberg wrote:
 


On Sat, Feb 4, 2017 at 7:09 PM, Dan Norton <[hidden email]> wrote:
 Hi Folks,

On Debian8, using cog_linux64x64_squeak.cog.spur_201702021058 and Cuis5.0-3040-spur-64 updated to 3043, everything will stop, including the clock in the Cuis task bar. Neither mouse nor keyboard events are processed. Trying to interrupt with alt-. is unsuccessful. CogSpur was re-cloned and the test case was re-run with the same result.

On Windows 10, using cog_win32x86_squeak.cog.spur_201702021058, and Cuis 5.0 3043, the same test case runs for hours with no problem.

Freezes on cog_macos64x64_squeak.cog.spur_201702021058 too. Maybe try the 64 bit windows VM?

- Bert -
 

Yes, we'd try that. I'm out of town, and will not be able to do it this week.

If anyone wants to try, when I tried Spur64 in Windows the image would crash. The workaround is to clear the World background (an ivar in WorldState). There seems to be some problem with jpeg decompression on that VM.

Thanks,

Yes, jpeg is broken in Win64, but works fine in MacOSX 64 and Linux 64 with same sources.
I bet that it's related to LLP64...
I tired to update to latest available code for this jpeg library, but it does not fix anything (and would introduce controversial deviations to the standard).
We could comment out the generation of jpeg plugin in the interim
 


Reply | Threaded
Open this post in threaded view
|

Re: Freeze after Morph Activity

Nicolas Cellier
 


2017-02-07 11:09 GMT+01:00 Nicolas Cellier <[hidden email]>:


2017-02-07 1:42 GMT+01:00 Juan Vuletich <[hidden email]>:
 
On 06/02/2017 01:17 p.m., Bert Freudenberg wrote:
 


On Sat, Feb 4, 2017 at 7:09 PM, Dan Norton <[hidden email]> wrote:
 Hi Folks,

On Debian8, using cog_linux64x64_squeak.cog.spur_201702021058 and Cuis5.0-3040-spur-64 updated to 3043, everything will stop, including the clock in the Cuis task bar. Neither mouse nor keyboard events are processed. Trying to interrupt with alt-. is unsuccessful. CogSpur was re-cloned and the test case was re-run with the same result.

On Windows 10, using cog_win32x86_squeak.cog.spur_201702021058, and Cuis 5.0 3043, the same test case runs for hours with no problem.

Freezes on cog_macos64x64_squeak.cog.spur_201702021058 too. Maybe try the 64 bit windows VM?

- Bert -
 

Yes, we'd try that. I'm out of town, and will not be able to do it this week.

If anyone wants to try, when I tried Spur64 in Windows the image would crash. The workaround is to clear the World background (an ivar in WorldState). There seems to be some problem with jpeg decompression on that VM.

Thanks,

Yes, jpeg is broken in Win64, but works fine in MacOSX 64 and Linux 64 with same sources.
I bet that it's related to LLP64...
I tired to update to latest available code for this jpeg library, but it does not fix anything (and would introduce controversial deviations to the standard).
We could comment out the generation of jpeg plugin in the interim
 

Follow-up: jpeg should be fixed for Win64 in latest builds on the server. It was not related to LLP64 but rather a problem of jump_buf alignment.
https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/119
 

Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Juan Vuletich-3
In reply to this post by Bert Freudenberg
 
Hi Folks,

Today, I was able to get closer to the origin of the problem. The problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1 or two milliseconds. (Usually they are quite larger, and the problem does not appear). With the 64-bit VM, sometimes very short delays hang the complete system. I don't know why, but this:

waitDelay _ Delay forMilliseconds: 50.
[ true ] whileTrue: [
    1000 atRandom print.
    waitDelay setDelay: 1; wait ].

is enough to hang Cuis in a short time. When trying to reproduce the problem in Squeak I had mixed results. It doesn't seem to hang there, but if I let this run for a couple of minutes and then try to halt with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As a workaround, in Cuis in the inter cycle pause, I'm avoiding very short delays.

I'm not sure if the bug is in the VM, in  the image, or both, but it is still there, and it also affects Squeak. I haven't tried it, but Pharo might also have the problem (if not, it can show a possible solution).

BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ...

Thanks,

On 2/6/2017 12:28 PM, Bert Freudenberg wrote:
 


Yes, I tried with Spur64. Have not seen it freeze yet on Mac in Squeak, but we possibly have not replicated the test well enough yet.

It *did* freeze in Cuis Spur64 on Mac too. 

- Bert -

On Mon, Feb 6, 2017 at 2:08 PM, Juan Vuletich <[hidden email]> wrote:
 
Were you usig Spur64? On Debian it does not freeze with CogV3 or Spur32. Only with Spur64.

Thanks,
Juan

Sent using Zoho Mail


---- On Mon, 06 Feb 2017 09:19:48 -0300 [hidden email] wrote ----

On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis <[hidden email]> wrote:
It would be interesting to try reproducing this in a Squeak image, but I
have not quite figured out how to make the Tokenish example work in Squeak.

See attachment. Did not freeze on Mac with 10 Tokenish jumping.
- Bert -
 




-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Juan Vuletich-3
 
I finally understood what was happening. The problem with is primitive 136:

primSignal: aSemaphore atMilliseconds: aSmallInteger
    "Signal the semaphore when the millisecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive."
    <primitive: 136>
    ^self primitiveFailed

If the requested tick is already in the past, all the VMs except for Spur-64 will signal the semaphore anyway. Spur-64 won't. Taking care of this in the Cuis image was easy enough.

Squeak seems not to be affected because it uses a newer primitive, that takes the large integer microsecond ticker, that might have the older behavior (signaling anyway). I'm not really sure, and maybe someone could check the primitive behavior and the corresponding assumptions in Squeak.

Older Squeak images are not affected, as they are all 32-bit. So maybe there's no action required. In any case, making the behavior consistent across VMs and in both primitives would be nice.

Cheers,

On 3/3/2017 3:55 PM, Juan Vuletich wrote:
 


Hi Folks,

Today, I was able to get closer to the origin of the problem. The problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1 or two milliseconds. (Usually they are quite larger, and the problem does not appear). With the 64-bit VM, sometimes very short delays hang the complete system. I don't know why, but this:

waitDelay _ Delay forMilliseconds: 50.
[ true ] whileTrue: [
    1000 atRandom print.
    waitDelay setDelay: 1; wait ].

is enough to hang Cuis in a short time. When trying to reproduce the problem in Squeak I had mixed results. It doesn't seem to hang there, but if I let this run for a couple of minutes and then try to halt with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As a workaround, in Cuis in the inter cycle pause, I'm avoiding very short delays.

I'm not sure if the bug is in the VM, in  the image, or both, but it is still there, and it also affects Squeak. I haven't tried it, but Pharo might also have the problem (if not, it can show a possible solution).

BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ...

Thanks,

On 2/6/2017 12:28 PM, Bert Freudenberg wrote:
 


Yes, I tried with Spur64. Have not seen it freeze yet on Mac in Squeak, but we possibly have not replicated the test well enough yet.

It *did* freeze in Cuis Spur64 on Mac too. 

- Bert -

On Mon, Feb 6, 2017 at 2:08 PM, Juan Vuletich <[hidden email]> wrote:
 
Were you usig Spur64? On Debian it does not freeze with CogV3 or Spur32. Only with Spur64.

Thanks,
Juan

Sent using Zoho Mail


---- On Mon, 06 Feb 2017 09:19:48 -0300 [hidden email] wrote ----

On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis <[hidden email]> wrote:
It would be interesting to try reproducing this in a Squeak image, but I
have not quite figured out how to make the Tokenish example work in Squeak.

See attachment. Did not freeze on Mac with 10 Tokenish jumping.
- Bert -
 




-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich


-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

David T. Lewis
 
In the VM, the millisecond clock wraps within the 32 bit integer range:

  #define MillisecondClockMask 0x1FFFFFFF

In the Cuis image, Delay class>>handleTimerEvent does this:

  nextTick := nextTick min: SmallInteger maxVal.

On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
a 32-bit image it is 16r3FFFFFFF.

Could that be it?

I don't really know how to test in Squeak. As you say, Squeak is now
using the microsecond clock in #handleTimerEvent. I do not see anything
in primitiveSignalAtMilliseconds that would behave any differently on
a 64 bit versus 32 bit image or VM, but I do not know how to test to
be sure.

Dave


On Mon, Mar 06, 2017 at 11:06:22AM -0300, Juan Vuletich wrote:

>  
> I finally understood what was happening. The problem with is primitive 136:
>
> primSignal: aSemaphore atMilliseconds: aSmallInteger
>     "Signal the semaphore when the millisecond clock reaches the value
> of the second argument. Fail if the first argument is neither a
> Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive."
> <primitive: 136>
>     ^self primitiveFailed
>
> If the requested tick is already in the past, all the VMs except for
> Spur-64 will signal the semaphore anyway. Spur-64 won't. Taking care of
> this in the Cuis image was easy enough.
>
> Squeak seems not to be affected because it uses a newer primitive, that
> takes the large integer microsecond ticker, that might have the older
> behavior (signaling anyway). I'm not really sure, and maybe someone
> could check the primitive behavior and the corresponding assumptions in
> Squeak.
>
> Older Squeak images are not affected, as they are all 32-bit. So maybe
> there's no action required. In any case, making the behavior consistent
> across VMs and in both primitives would be nice.
>
> Cheers,
>
> On 3/3/2017 3:55 PM, Juan Vuletich wrote:
> >  
> >
> >
> >Hi Folks,
> >
> >Today, I was able to get closer to the origin of the problem. The
> >problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1
> >or two milliseconds. (Usually they are quite larger, and the problem
> >does not appear). With the 64-bit VM, sometimes very short delays hang
> >the complete system. I don't know why, but this:
> >
> >waitDelay _ Delay forMilliseconds: 50.
> >[ true ] whileTrue: [
> >    1000 atRandom print.
> >    waitDelay setDelay: 1; wait ].
> >
> >is enough to hang Cuis in a short time. When trying to reproduce the
> >problem in Squeak I had mixed results. It doesn't seem to hang there,
> >but if I let this run for a couple of minutes and then try to halt
> >with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As
> >a workaround, in Cuis in the inter cycle pause, I'm avoiding very
> >short delays.
> >
> >I'm not sure if the bug is in the VM, in  the image, or both, but it
> >is still there, and it also affects Squeak. I haven't tried it, but
> >Pharo might also have the problem (if not, it can show a possible
> >solution).
> >
> >BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ...
> >
> >Thanks,
> >
> >On 2/6/2017 12:28 PM, Bert Freudenberg wrote:
> >>  
> >>
> >>
> >>Yes, I tried with Spur64. Have not seen it freeze yet on Mac in
> >>Squeak, but we possibly have not replicated the test well enough yet.
> >>
> >>It *did* freeze in Cuis Spur64 on Mac too.
> >>
> >>- Bert -
> >>
> >>On Mon, Feb 6, 2017 at 2:08 PM, Juan Vuletich <[hidden email]
> >><mailto:[hidden email]>> wrote:
> >>
> >>
> >>    Were you usig Spur64? On Debian it does not freeze with CogV3 or
> >>    Spur32. Only with Spur64.
> >>
> >>    Thanks,
> >>    Juan
> >>
> >>    Sent using Zoho Mail <https://www.zoho.com/mail/>
> >>
> >>
> >>        ---- On Mon, 06 Feb 2017 09:19:48 -0300 *[hidden email]
> >>        <mailto:[hidden email]> * wrote ----
> >>
> >>        On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis
> >>        <[hidden email] <mailto:[hidden email]>> wrote:
> >>
> >>            It would be interesting to try reproducing this in a
> >>            Squeak image, but I
> >>            have not quite figured out how to make the Tokenish
> >>            example work in Squeak.
> >>
> >>
> >>        See attachment. Did not freeze on Mac with 10 Tokenish jumping.
> >>        - Bert -
> >>
> >>
> >>
> >
> >
> >--
> >Juan Vuletich
> >www.cuis-smalltalk.org
> >https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> >@JuanVuletich
>
>
> --
> Juan Vuletich
> www.cuis-smalltalk.org
> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> @JuanVuletich
>

Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Juan Vuletich-3
 
Hi Dave,

Thanks for answering. Inline.

On 3/6/2017 10:50 PM, David T. Lewis wrote:

>
> In the VM, the millisecond clock wraps within the 32 bit integer range:
>
>    #define MillisecondClockMask 0x1FFFFFFF
>
> In the Cuis image, Delay class>>handleTimerEvent does this:
>
>    nextTick := nextTick min: SmallInteger maxVal.
>
> On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
> a 32-bit image it is 16r3FFFFFFF.
>
> Could that be it?

I wasn't aware of that, and had assumed that millisecond timer would use
the whole SmallInteger range. This might introduce a bug, that would
only appear at timer rollover, i.e. about 6 days after image startup.
I'll fix this. Thanks.

But this is a completely separated issue. The problem we saw, the
semaphore never being signaled if deadline is in the past, happens
immediately after image startup.

> I don't really know how to test in Squeak. As you say, Squeak is now
> using the microsecond clock in #handleTimerEvent. I do not see anything
> in primitiveSignalAtMilliseconds that would behave any differently on
> a 64 bit versus 32 bit image or VM, but I do not know how to test to
> be sure.
>
> Dave

Well, what follows is a way to test VM behavior. Tested in Cuis, but
should be trivial to reproduce in Squeak, as it is a VM issue. I Cuis
add (copied from Squeak):

!Time class methodsFor: 'general inquiries' stamp: 'jmv 3/7/2017 08:58:12'!
utcMicrosecondClock
     "Answer the UTC microseconds since the Smalltalk epoch (January 1st
1901, the start of the 20th century).
      The value is derived from the Posix epoch with a constant offset
corresponding to elapsed microseconds
      between the two epochs according to RFC 868."
<primitive: 240>
     ^0! !

!Delay class methodsFor: 'primitives' stamp: 'jmv 3/7/2017 08:57:45'!
primSignal: aSemaphore atUTCMicroseconds: anInteger
     "Signal the semaphore when the UTC microsecond clock reaches the
value of the second argument.
      Fail if the first argument is neither a Semaphore nor nil, or if
the second argument is not an integer.
      Essential. See Object documentation whatIsAPrimitive."
<primitive: 242>
     ^self primitiveFailed! !

Then, in a Workspace, try the following 4 doits:

s _ Semaphore new.
Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10.
s wait.
'Ok' print.

s _ Semaphore new.
Delay primSignal: s atMilliseconds: Time millisecondClockValue + 10.
s wait.
'Ok' print.

s _ Semaphore new.
Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10.
s wait.
'Ok' print.

s _ Semaphore new.
Delay primSignal: s atMilliseconds: Time millisecondClockValue - 10.
s wait.
'Not OK at all' print.

On Spur32, all 4 finish immediately. On Spur64, the first 3 also finish
immediately, but the fourth freezes the image. The difference in
behavior between Spur32 and Spur64 (on Linux) is indeed there.


Ok. Also tried Squeak (note that instead of #millisecondClockValue in
Squeak it is #primMillisecondClock) :

s _ Semaphore new.
Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10.
s wait.
'Ok'.

s _ Semaphore new.
Delay primSignal: s atMilliseconds: Time primMillisecondClock + 10.
s wait.
'Ok'.

s _ Semaphore new.
Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10.
s wait.
'Ok'.

s _ Semaphore new.
Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
s wait.
'Not OK at all'.

Exactly the same behavior.

I just took a look at
static void primitiveSignalAtMilliseconds(void)
in
https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c
The only thing I see is that msecs is an usqInt and deltaMsecs is an
sqInt. But I'm not good enough at gcc subtleties to say if this matters
at all. I mean, it looks as if  'if (deltaMsecs < 0) {' was true on
Spur64 and false on Spur32... Or maybe the difference is in the handling
of nextWakeupUsecs ...

In any case, it looks like deadlines in the past are not supported (as
code assumes they are because of rollover...)

Thanks,

--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich


Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

David T. Lewis
 
Hi Juan,

I confirm your Cuis test results on Squeak using several image/VM combinations,
details below.

On Tue, Mar 07, 2017 at 09:49:53AM -0300, Juan Vuletich wrote:

> Hi Dave,
>
> Thanks for answering. Inline.
>
> On 3/6/2017 10:50 PM, David T. Lewis wrote:
> >
> >In the VM, the millisecond clock wraps within the 32 bit integer range:
> >
> >   #define MillisecondClockMask 0x1FFFFFFF
> >
> >In the Cuis image, Delay class>>handleTimerEvent does this:
> >
> >   nextTick := nextTick min: SmallInteger maxVal.
> >
> >On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
> >a 32-bit image it is 16r3FFFFFFF.
> >
> >Could that be it?
>
> I wasn't aware of that, and had assumed that millisecond timer would use
> the whole SmallInteger range. This might introduce a bug, that would
> only appear at timer rollover, i.e. about 6 days after image startup.
> I'll fix this. Thanks.
>
> But this is a completely separated issue. The problem we saw, the
> semaphore never being signaled if deadline is in the past, happens
> immediately after image startup.
>
> >I don't really know how to test in Squeak. As you say, Squeak is now
> >using the microsecond clock in #handleTimerEvent. I do not see anything
> >in primitiveSignalAtMilliseconds that would behave any differently on
> >a 64 bit versus 32 bit image or VM, but I do not know how to test to
> >be sure.
> >
> >Dave
>
> Well, what follows is a way to test VM behavior. Tested in Cuis, but
> should be trivial to reproduce in Squeak, as it is a VM issue. I Cuis
> add (copied from Squeak):

I tried this with 4 different Squeak image/VM combinations:

- Squeak 3.8 with interpreter VM (an older image that uses millisecond
  clock for Delay)

- Squeak trunk V3 image with interpreter VM (latest version image, but
  non-Spur, updated via www.squeaksource.com/TrunkUpdateStreamV3)

- Squeak trunk 32 bit Spur

- Squeak trunk 64 bit Spur

>
> !Time class methodsFor: 'general inquiries' stamp: 'jmv 3/7/2017 08:58:12'!
> utcMicrosecondClock
>     "Answer the UTC microseconds since the Smalltalk epoch (January 1st
> 1901, the start of the 20th century).
>      The value is derived from the Posix epoch with a constant offset
> corresponding to elapsed microseconds
>      between the two epochs according to RFC 868."
> <primitive: 240>
>     ^0! !
>
> !Delay class methodsFor: 'primitives' stamp: 'jmv 3/7/2017 08:57:45'!
> primSignal: aSemaphore atUTCMicroseconds: anInteger
>     "Signal the semaphore when the UTC microsecond clock reaches the
> value of the second argument.
>      Fail if the first argument is neither a Semaphore nor nil, or if
> the second argument is not an integer.
>      Essential. See Object documentation whatIsAPrimitive."
> <primitive: 242>
>     ^self primitiveFailed! !

I tried adding these to my Squeak 3.8 image for the test. It does not
work properly because the primitive table was different back then, and
the interpreter VM is automatically adjusting for this so not calling
primitive 240 (actually it calls the old #primitiveSerialPortWrite rather
than #primitiveUTCMicrosecondClock that later replaced it).

Nevertheless, the primSignal:atMilliseconds: works, and there is no problem
with a -10 parameter, so these are included marked in the results below.

I also note that I locked up the Squeak 3.8 image a couple of times while
running various tests with bad input parameters. It is not reproduceable,
but there may be something bad about calling #primSignal:atMilliseconds:
in an image that is also using it for the Delay mechanism.

I also locked up a Spur 32 image when calling primSignal:atUTCMicroseconds:
so this may be the same problem, it may not be safe to call this when the
same method is being used for Delay handling.

>
> Then, in a Workspace, try the following 4 doits:
>
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10.
> s wait.
> 'Ok' print.
>
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time millisecondClockValue + 10.
> s wait.
> 'Ok' print.
>
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10.
> s wait.
> 'Ok' print.
>
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time millisecondClockValue - 10.
> s wait.
> 'Not OK at all' print.
>
> On Spur32, all 4 finish immediately. On Spur64, the first 3 also finish
> immediately, but the fourth freezes the image. The difference in
> behavior between Spur32 and Spur64 (on Linux) is indeed there.
>
>
> Ok. Also tried Squeak (note that instead of #millisecondClockValue in
> Squeak it is #primMillisecondClock) :

Test results for my four Squeak image/VM combinations are added below.


>
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock + 10.
> s wait.
> 'Ok'.

Squeak 3.8 => OK
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => OK

>
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time primMillisecondClock + 10.
> s wait.
> 'Ok'.

Squeak 3.8 => OK
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => OK

>
> s _ Semaphore new.
> Delay primSignal: s atUTCMicroseconds: Time utcMicrosecondClock - 10.
> s wait.
> 'Ok'.

Squeak 3.8 => primitive failed (but see note above for Squeak 3.8 using different primitive table)
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => OK

>
> s _ Semaphore new.
> Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
> s wait.
> 'Not OK at all'.

Squeak 3.8 => OK
Squeak trunk V3 interpreter => OK
Squeak trunk Spur 32 => OK
Squeak trunk Spur 64 => Not OK at all, hangs image

>
> Exactly the same behavior.

Confirmed.

>
> I just took a look at
> static void primitiveSignalAtMilliseconds(void)
> in
> https://raw.githubusercontent.com/OpenSmalltalk/opensmalltalk-vm/Cog/src/vm/cointerp.c
> The only thing I see is that msecs is an usqInt and deltaMsecs is an
> sqInt. But I'm not good enough at gcc subtleties to say if this matters
> at all. I mean, it looks as if  'if (deltaMsecs < 0) {' was true on
> Spur64 and false on Spur32... Or maybe the difference is in the handling
> of nextWakeupUsecs ...

I see that ioMSecs() is declared as signed long (32 bits), but it is used in
expression with a 64 bit usqInt. So maybe it needs a cast, or maybe the variables
like msecs and deltaMsecs in primitiveSignalAtMilliseconds should be declared
as 32 bit long and unsigned long to match the actual usage.

Unfortunately I cannot easily recompile to verify (build problems on my Ubuntu
for Cog/Spur, sorry), but maybe someone else can take a look at this?

>
> In any case, it looks like deadlines in the past are not supported (as
> code assumes they are because of rollover...)

I agree this looks like a bug in the 64 bit VMs. But I do not yet see the
reason for it.

Dave

Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Eliot Miranda-2
In reply to this post by Juan Vuletich-3
 
Hi Juan, Hi All,

  

On Mon, Mar 6, 2017 at 6:06 AM, Juan Vuletich <[hidden email]> wrote:
 
I finally understood what was happening. The problem with is primitive 136:

primSignal: aSemaphore atMilliseconds: aSmallInteger
    "Signal the semaphore when the millisecond clock reaches the value of the second argument. Fail if the first argument is neither a Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive."
    <primitive: 136>
    ^self primitiveFailed

Yes, but primitive 136 *should not be used* :-).  You should use primitive 242:
primSignal: aSemaphore atUTCMicroseconds: anInteger
"Signal the semaphore when the UTC microsecond clock reaches the value of the second argument.
Fail if the first argument is neither a Semaphore nor nil, or if the second argument is not an integer.
Essential. See Object documentation whatIsAPrimitive."
<primitive: 242>
^self primitiveFailed

The use of the UTC microsecond count eliminates any and all wrapping issues for about 50,000 years [puhleaze Mr Pruitt can you try and do your bit to make it possible that we'll make it that far. hic. ed.].  The old primitive 136 primSignal:atMilliseconds: is subject to wrapping problems.  I see that primSignal:atMilliseconds: is still used in Squeak 5.x but only to turn off signaling:

Delay class>>shutDown
"Suspend the active delay, if any, before snapshotting. It will be reactived when the snapshot is resumed."
"Details: This prevents a timer interrupt from waking up the active delay in the midst snapshoting, since the active delay will be restarted when resuming the snapshot and we don't want to process the delay twice."

AccessProtect wait.
self primSignal: nil atMilliseconds: 0.
self saveResumptionTimes.
DelaySuspended := true.

Delay class>>stopTimerInterruptWatcher
"Reset the class variables that keep track of active Delays and re-start the timer interrupt watcher process. Any currently scheduled delays are forgotten."
"Delay startTimerInterruptWatcher"
self primSignal: nil atMilliseconds: 0.
TimingSemaphore ifNotNil:[TimingSemaphore terminateProcess].

I will change this now.


HTH


If the requested tick is already in the past, all the VMs except for Spur-64 will signal the semaphore anyway. Spur-64 won't. Taking care of this in the Cuis image was easy enough.

Squeak seems not to be affected because it uses a newer primitive, that takes the large integer microsecond ticker, that might have the older behavior (signaling anyway). I'm not really sure, and maybe someone could check the primitive behavior and the corresponding assumptions in Squeak.

Older Squeak images are not affected, as they are all 32-bit. So maybe there's no action required. In any case, making the behavior consistent across VMs and in both primitives would be nice.

Cheers,

On 3/3/2017 3:55 PM, Juan Vuletich wrote:
 


Hi Folks,

Today, I was able to get closer to the origin of the problem. The problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1 or two milliseconds. (Usually they are quite larger, and the problem does not appear). With the 64-bit VM, sometimes very short delays hang the complete system. I don't know why, but this:

waitDelay _ Delay forMilliseconds: 50.
[ true ] whileTrue: [
    1000 atRandom print.
    waitDelay setDelay: 1; wait ].

is enough to hang Cuis in a short time. When trying to reproduce the problem in Squeak I had mixed results. It doesn't seem to hang there, but if I let this run for a couple of minutes and then try to halt with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As a workaround, in Cuis in the inter cycle pause, I'm avoiding very short delays.

I'm not sure if the bug is in the VM, in  the image, or both, but it is still there, and it also affects Squeak. I haven't tried it, but Pharo might also have the problem (if not, it can show a possible solution).

BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ...

Thanks,

On 2/6/2017 12:28 PM, Bert Freudenberg wrote:
 


Yes, I tried with Spur64. Have not seen it freeze yet on Mac in Squeak, but we possibly have not replicated the test well enough yet.

It *did* freeze in Cuis Spur64 on Mac too. 

- Bert -

On Mon, Feb 6, 2017 at 2:08 PM, Juan Vuletich <[hidden email]> wrote:
 
Were you usig Spur64? On Debian it does not freeze with CogV3 or Spur32. Only with Spur64.

Thanks,
Juan

Sent using Zoho Mail


---- On Mon, 06 Feb 2017 09:19:48 -0300 [hidden email] wrote ----

On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis <[hidden email]> wrote:
It would be interesting to try reproducing this in a Squeak image, but I
have not quite figured out how to make the Tokenish example work in Squeak.

See attachment. Did not freeze on Mac with 10 Tokenish jumping.
- Bert -
 




-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich


-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
@JuanVuletich




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Eliot Miranda-2
In reply to this post by David T. Lewis
 
Hi David,

On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> wrote:

In the VM, the millisecond clock wraps within the 32 bit integer range:

  #define MillisecondClockMask 0x1FFFFFFF

In the Cuis image, Delay class>>handleTimerEvent does this:

  nextTick := nextTick min: SmallInteger maxVal.

On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
a 32-bit image it is 16r3FFFFFFF.

Could that be it?

I don't really know how to test in Squeak. As you say, Squeak is now
using the microsecond clock in #handleTimerEvent. I do not see anything
in primitiveSignalAtMilliseconds that would behave any differently on
a 64 bit versus 32 bit image or VM, but I do not know how to test to
be sure.

I bet that the following code from primitiveSignalAtMilliseconds ends up wrapping when given
    Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.

deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
deltaMsecs < 0 ifTrue:
[deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).

and I bet you'll find that the VM will wake up in about 6 days and 5 hours ;-)


I suppose we could fix this, but I'm *much* happier to simply not use primitiveSignalAtMilliseconds and stay with the simpler and wrapping-immune primitiveSignalAtUTCMicroseconds


Dave


On Mon, Mar 06, 2017 at 11:06:22AM -0300, Juan Vuletich wrote:
>
> I finally understood what was happening. The problem with is primitive 136:
>
> primSignal: aSemaphore atMilliseconds: aSmallInteger
>     "Signal the semaphore when the millisecond clock reaches the value
> of the second argument. Fail if the first argument is neither a
> Semaphore nor nil. Essential. See Object documentation whatIsAPrimitive."
> <primitive: 136>
>     ^self primitiveFailed
>
> If the requested tick is already in the past, all the VMs except for
> Spur-64 will signal the semaphore anyway. Spur-64 won't. Taking care of
> this in the Cuis image was easy enough.
>
> Squeak seems not to be affected because it uses a newer primitive, that
> takes the large integer microsecond ticker, that might have the older
> behavior (signaling anyway). I'm not really sure, and maybe someone
> could check the primitive behavior and the corresponding assumptions in
> Squeak.
>
> Older Squeak images are not affected, as they are all 32-bit. So maybe
> there's no action required. In any case, making the behavior consistent
> across VMs and in both primitives would be nice.
>
> Cheers,
>
> On 3/3/2017 3:55 PM, Juan Vuletich wrote:
> >
> >
> >
> >Hi Folks,
> >
> >Today, I was able to get closer to the origin of the problem. The
> >problem is when Morphic tries to do a Delay (inter-cycle-pause) of 1
> >or two milliseconds. (Usually they are quite larger, and the problem
> >does not appear). With the 64-bit VM, sometimes very short delays hang
> >the complete system. I don't know why, but this:
> >
> >waitDelay _ Delay forMilliseconds: 50.
> >[ true ] whileTrue: [
> >    1000 atRandom print.
> >    waitDelay setDelay: 1; wait ].
> >
> >is enough to hang Cuis in a short time. When trying to reproduce the
> >problem in Squeak I had mixed results. It doesn't seem to hang there,
> >but if I let this run for a couple of minutes and then try to halt
> >with alt-. , then the hang occurs, and Squeak becomes irresponsibe. As
> >a workaround, in Cuis in the inter cycle pause, I'm avoiding very
> >short delays.
> >
> >I'm not sure if the bug is in the VM, in  the image, or both, but it
> >is still there, and it also affects Squeak. I haven't tried it, but
> >Pharo might also have the problem (if not, it can show a possible
> >solution).
> >
> >BTW, Bert, I guess Squeak never does short delays in #interCyclePause: ...
> >
> >Thanks,
> >
> >On 2/6/2017 12:28 PM, Bert Freudenberg wrote:
> >>
> >>
> >>
> >>Yes, I tried with Spur64. Have not seen it freeze yet on Mac in
> >>Squeak, but we possibly have not replicated the test well enough yet.
> >>
> >>It *did* freeze in Cuis Spur64 on Mac too.
> >>
> >>- Bert -
> >>
> >>On Mon, Feb 6, 2017 at 2:08 PM, Juan Vuletich <[hidden email]
> >><mailto:[hidden email]>> wrote:
> >>
> >>
> >>    Were you usig Spur64? On Debian it does not freeze with CogV3 or
> >>    Spur32. Only with Spur64.
> >>
> >>    Thanks,
> >>    Juan
> >>
> >>    Sent using Zoho Mail <https://www.zoho.com/mail/>
> >>
> >>
> >>        ---- On Mon, 06 Feb 2017 09:19:48 -0300 *[hidden email]
> >>        <mailto:[hidden email]> * wrote ----
> >>
> >>        On Mon, Feb 6, 2017 at 3:15 AM, David T. Lewis
> >>        <[hidden email] <mailto:[hidden email]>> wrote:
> >>
> >>            It would be interesting to try reproducing this in a
> >>            Squeak image, but I
> >>            have not quite figured out how to make the Tokenish
> >>            example work in Squeak.
> >>
> >>
> >>        See attachment. Did not freeze on Mac with 10 Tokenish jumping.
> >>        - Bert -
> >>
> >>
> >>
> >
> >
> >--
> >Juan Vuletich
> >www.cuis-smalltalk.org
> >https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> >@JuanVuletich
>
>
> --
> Juan Vuletich
> www.cuis-smalltalk.org
> https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
> @JuanVuletich
>




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

David T. Lewis
 
On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:

>  
> Hi David,
>
> On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> wrote:
>
> >
> > In the VM, the millisecond clock wraps within the 32 bit integer range:
> >
> >   #define MillisecondClockMask 0x1FFFFFFF
> >
> > In the Cuis image, Delay class>>handleTimerEvent does this:
> >
> >   nextTick := nextTick min: SmallInteger maxVal.
> >
> > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
> > a 32-bit image it is 16r3FFFFFFF.
> >
> > Could that be it?
> >
> > I don't really know how to test in Squeak. As you say, Squeak is now
> > using the microsecond clock in #handleTimerEvent. I do not see anything
> > in primitiveSignalAtMilliseconds that would behave any differently on
> > a 64 bit versus 32 bit image or VM, but I do not know how to test to
> > be sure.
> >
>
> I bet that the following code from primitiveSignalAtMilliseconds ends up
> wrapping when given
>     Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
>
> deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
> deltaMsecs < 0 ifTrue:
> [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
> nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
>
> and I bet you'll find that the VM will wake up in about 6 days and 5 hours
> ;-)
>

No. The failure is specific to the 64 bit VM. Source code for the primitive
is the same in either case.

>
> I suppose we could fix this, but I'm *much* happier to simply not use
> primitiveSignalAtMilliseconds and stay with the simpler and wrapping-immune
> primitiveSignalAtUTCMicroseconds

Fair enough, but given that there is a demonstrated bug that affects only
the 64-bit VM, and given that it expresses itself intermittently and in ways
that affect only someone who is attempting to migrate their existing V3 image
to Spur, then I would say that it makes very good sense to take the time
to fix it if we are able to do so. After all, there may be other people who
will want to migrate V3 images to Spur, and there is no point in making the
process needlessly difficult.

I do not have the solution, but maybe someone else can help. So I am asking
for help here. Can someone with a working 64-bit build environment please
check and see if what I said in an earlier email might make a difference:

  I see that ioMSecs() is declared as signed long (32 bits), but it is used
  in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe
  the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds
  should be declared as 32 bit long and unsigned long to match the actual usage.

Thanks,

Dave

Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Nicolas Cellier
 


2017-03-10 3:47 GMT+01:00 David T. Lewis <[hidden email]>:

On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
>
> Hi David,
>
> On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> wrote:
>
> >
> > In the VM, the millisecond clock wraps within the 32 bit integer range:
> >
> >   #define MillisecondClockMask 0x1FFFFFFF
> >
> > In the Cuis image, Delay class>>handleTimerEvent does this:
> >
> >   nextTick := nextTick min: SmallInteger maxVal.
> >
> > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
> > a 32-bit image it is 16r3FFFFFFF.
> >
> > Could that be it?
> >
> > I don't really know how to test in Squeak. As you say, Squeak is now
> > using the microsecond clock in #handleTimerEvent. I do not see anything
> > in primitiveSignalAtMilliseconds that would behave any differently on
> > a 64 bit versus 32 bit image or VM, but I do not know how to test to
> > be sure.
> >
>
> I bet that the following code from primitiveSignalAtMilliseconds ends up
> wrapping when given
>     Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
>
> deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
> deltaMsecs < 0 ifTrue:
> [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
> nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
>
> and I bet you'll find that the VM will wake up in about 6 days and 5 hours
> ;-)
>

No. The failure is specific to the 64 bit VM. Source code for the primitive
is the same in either case.


>
> I suppose we could fix this, but I'm *much* happier to simply not use
> primitiveSignalAtMilliseconds and stay with the simpler and wrapping-immune
> primitiveSignalAtUTCMicroseconds

Fair enough, but given that there is a demonstrated bug that affects only
the 64-bit VM, and given that it expresses itself intermittently and in ways
that affect only someone who is attempting to migrate their existing V3 image
to Spur, then I would say that it makes very good sense to take the time
to fix it if we are able to do so. After all, there may be other people who
will want to migrate V3 images to Spur, and there is no point in making the
process needlessly difficult.

I do not have the solution, but maybe someone else can help. So I am asking
for help here. Can someone with a working 64-bit build environment please
check and see if what I said in an earlier email might make a difference:

  I see that ioMSecs() is declared as signed long (32 bits), but it is used
  in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe
  the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds
  should be declared as 32 bit long and unsigned long to match the actual usage.

Thanks,

Dave



Hi David
I just reviewed the generated code for primitiveSignalAtMilliseconds andhere is what i found:

The primitive will simply fail if fed with a negative integer, thanks to:

        msecs = positive32BitValueOf(msecsObj);

msecs is allways positive, since declared as
    usqInt msecs;

So far, so good.

Then
                        deltaMsecs = msecs - ((ioMSecs()) & MillisecondClockMask);
                        if (deltaMsecs < 0) {
                                deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1;
                        }
                        GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000);

Due to the 29 bits Mask, the bitAnd: (&) operation will result into a positive int.
0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF

However, when we do the Roll-over thing:
                                deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1;
Then we create a giant delay (the 6 days mentionned by Eliot) which is HIGHLY QUESTIONNABLE whatever 32 or 64 bits!

The difference between 32 and 64 bits is in the next line:
                        GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000);
Because
    sqInt deltaMsecs;

is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs * 1000) will do two different things:
- in 32 bits it will overflow leading to a negative delay and a nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
- in 64 bits it won't overflow leading to he 6 days delay

Does that explain?

if (deltaMsecs < 0) I would either fail the primitive or set the deltaMsecs to zero.
What do you think?

 Nicolas
 
Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

Nicolas Cellier
 


2017-03-10 9:45 GMT+01:00 Nicolas Cellier <[hidden email]>:


2017-03-10 3:47 GMT+01:00 David T. Lewis <[hidden email]>:

On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
>
> Hi David,
>
> On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]> wrote:
>
> >
> > In the VM, the millisecond clock wraps within the 32 bit integer range:
> >
> >   #define MillisecondClockMask 0x1FFFFFFF
> >
> > In the Cuis image, Delay class>>handleTimerEvent does this:
> >
> >   nextTick := nextTick min: SmallInteger maxVal.
> >
> > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but on
> > a 32-bit image it is 16r3FFFFFFF.
> >
> > Could that be it?
> >
> > I don't really know how to test in Squeak. As you say, Squeak is now
> > using the microsecond clock in #handleTimerEvent. I do not see anything
> > in primitiveSignalAtMilliseconds that would behave any differently on
> > a 64 bit versus 32 bit image or VM, but I do not know how to test to
> > be sure.
> >
>
> I bet that the following code from primitiveSignalAtMilliseconds ends up
> wrapping when given
>     Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
>
> deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
> deltaMsecs < 0 ifTrue:
> [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
> nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
>
> and I bet you'll find that the VM will wake up in about 6 days and 5 hours
> ;-)
>

No. The failure is specific to the 64 bit VM. Source code for the primitive
is the same in either case.


>
> I suppose we could fix this, but I'm *much* happier to simply not use
> primitiveSignalAtMilliseconds and stay with the simpler and wrapping-immune
> primitiveSignalAtUTCMicroseconds

Fair enough, but given that there is a demonstrated bug that affects only
the 64-bit VM, and given that it expresses itself intermittently and in ways
that affect only someone who is attempting to migrate their existing V3 image
to Spur, then I would say that it makes very good sense to take the time
to fix it if we are able to do so. After all, there may be other people who
will want to migrate V3 images to Spur, and there is no point in making the
process needlessly difficult.

I do not have the solution, but maybe someone else can help. So I am asking
for help here. Can someone with a working 64-bit build environment please
check and see if what I said in an earlier email might make a difference:

  I see that ioMSecs() is declared as signed long (32 bits), but it is used
  in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe
  the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds
  should be declared as 32 bit long and unsigned long to match the actual usage.

Thanks,

Dave



Hi David
I just reviewed the generated code for primitiveSignalAtMilliseconds andhere is what i found:

The primitive will simply fail if fed with a negative integer, thanks to:

        msecs = positive32BitValueOf(msecsObj);

msecs is allways positive, since declared as
    usqInt msecs;

So far, so good.

Then
                        deltaMsecs = msecs - ((ioMSecs()) & MillisecondClockMask);
                        if (deltaMsecs < 0) {
                                deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1;
                        }
                        GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000);

Due to the 29 bits Mask, the bitAnd: (&) operation will result into a positive int.
0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF

However, when we do the Roll-over thing:
                                deltaMsecs = (deltaMsecs + MillisecondClockMask) + 1;
Then we create a giant delay (the 6 days mentionned by Eliot) which is HIGHLY QUESTIONNABLE whatever 32 or 64 bits!

The difference between 32 and 64 bits is in the next line:
                        GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) + (deltaMsecs * 1000);
Because
    sqInt deltaMsecs;

is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs * 1000) will do two different things:
- in 32 bits it will overflow leading to a negative delay and a nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
- in 64 bits it won't overflow leading to he 6 days delay

Does that explain?

if (deltaMsecs < 0) I would either fail the primitive or set the deltaMsecs to zero.
What do you think?

 Nicolas
 

Anyway, the RollOver protection sounds strange: if clock was high, and if we don't apply the MillisecondClockMask to (clock+delay),
then (clock+delay) won't overflow (thanks to largeIntegers). Eventually, for delay > 2^32-2^29 msecs, the primitive will fail thru (positive32BitValueOf).
That's something like 43 days delay max, more than enough...

Either the code should rather be something like this with roll-over protection
                        deltaMsecs = (msecs - ioMSecs()) & MillisecondClockMask;
or we should leave deltaMsecs as is and reject negative delays

Also note this in 32 bits:
Since *1000 consumes 10bits, then delays above 21 bits msecs will overflow and be signalled immediately.
Tha's about 2e6 msecs, 2e3 secs. Thus the maximum delay supported by this primitive is a bit less than 36 minutes (16r80000000/1000/1000/60).
So maybe we should also reject large positive "unreasonable" delays, that isanthing above a few minutes and document it.


Reply | Threaded
Open this post in threaded view
|

Re: We need help from VM experts. Re: Freeze after Morph&amp; nbsp; &amp; nbsp; &amp; nbsp; &amp; nbsp; Activity

David T. Lewis
In reply to this post by Nicolas Cellier
 
On Fri, Mar 10, 2017 at 09:45:07AM +0100, Nicolas Cellier wrote:

>  
> 2017-03-10 3:47 GMT+01:00 David T. Lewis <[hidden email]>:
>
> >
> > On Thu, Mar 09, 2017 at 12:41:44PM -0800, Eliot Miranda wrote:
> > >
> > > Hi David,
> > >
> > > On Mon, Mar 6, 2017 at 5:50 PM, David T. Lewis <[hidden email]>
> > wrote:
> > >
> > > >
> > > > In the VM, the millisecond clock wraps within the 32 bit integer range:
> > > >
> > > >   #define MillisecondClockMask 0x1FFFFFFF
> > > >
> > > > In the Cuis image, Delay class>>handleTimerEvent does this:
> > > >
> > > >   nextTick := nextTick min: SmallInteger maxVal.
> > > >
> > > > On a 64-bit Spur image, SmallInteger maxVal is 16rFFFFFFFFFFFFFFF, but
> > on
> > > > a 32-bit image it is 16r3FFFFFFF.
> > > >
> > > > Could that be it?
> > > >
> > > > I don't really know how to test in Squeak. As you say, Squeak is now
> > > > using the microsecond clock in #handleTimerEvent. I do not see anything
> > > > in primitiveSignalAtMilliseconds that would behave any differently on
> > > > a 64 bit versus 32 bit image or VM, but I do not know how to test to
> > > > be sure.
> > > >
> > >
> > > I bet that the following code from primitiveSignalAtMilliseconds ends up
> > > wrapping when given
> > >     Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
> > >
> > > deltaMsecs := msecs - (self ioMSecs bitAnd: MillisecondClockMask).
> > > deltaMsecs < 0 ifTrue:
> > > [deltaMsecs := deltaMsecs + MillisecondClockMask + 1].
> > > nextWakeupUsecs := self ioUTCMicroseconds + (deltaMsecs * 1000).
> > >
> > > and I bet you'll find that the VM will wake up in about 6 days and 5
> > hours
> > > ;-)
> > >
> >
> > No. The failure is specific to the 64 bit VM. Source code for the primitive
> > is the same in either case.
> >
> >
> >
> > > I suppose we could fix this, but I'm *much* happier to simply not use
> > > primitiveSignalAtMilliseconds and stay with the simpler and
> > wrapping-immune
> > > primitiveSignalAtUTCMicroseconds
> >
> > Fair enough, but given that there is a demonstrated bug that affects only
> > the 64-bit VM, and given that it expresses itself intermittently and in
> > ways
> > that affect only someone who is attempting to migrate their existing V3
> > image
> > to Spur, then I would say that it makes very good sense to take the time
> > to fix it if we are able to do so. After all, there may be other people who
> > will want to migrate V3 images to Spur, and there is no point in making the
> > process needlessly difficult.
> >
> > I do not have the solution, but maybe someone else can help. So I am asking
> > for help here. Can someone with a working 64-bit build environment please
> > check and see if what I said in an earlier email might make a difference:
> >
> >   I see that ioMSecs() is declared as signed long (32 bits), but it is used
> >   in expression with a 64 bit usqInt. So maybe it needs a cast, or maybe
> >   the variables like msecs and deltaMsecs in primitiveSignalAtMilliseconds
> >   should be declared as 32 bit long and unsigned long to match the actual
> > usage.
> >
> > Thanks,
> >
> > Dave
> >
> >
>
> Hi David
> I just reviewed the generated code for primitiveSignalAtMilliseconds
> andhere is what i found:
>
> The primitive will simply fail if fed with a negative integer, thanks to:
>
>         msecs = positive32BitValueOf(msecsObj);
>
> msecs is allways positive, since declared as
>     usqInt msecs;
>
> So far, so good.
>
> Then
>                         deltaMsecs = msecs - ((ioMSecs()) &
> MillisecondClockMask);
>                         if (deltaMsecs < 0) {
>                                 deltaMsecs = (deltaMsecs +
> MillisecondClockMask) + 1;
>                         }
>                         GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) +
> (deltaMsecs * 1000);
>
> Due to the 29 bits Mask, the bitAnd: (&) operation will result into a
> positive int.
> 0 <= ((ioMSecs()) & MillisecondClockMask) <= 0x1FFFFFFF
>
> However, when we do the Roll-over thing:
>                                 deltaMsecs = (deltaMsecs +
> MillisecondClockMask) + 1;
> Then we create a giant delay (the 6 days mentionned by Eliot) which is
> HIGHLY QUESTIONNABLE whatever 32 or 64 bits!
>
> The difference between 32 and 64 bits is in the next line:
>                         GIV(nextWakeupUsecs) = (ioUTCMicroseconds()) +
> (deltaMsecs * 1000);
> Because
>     sqInt deltaMsecs;
>
> is 32 bits for a 32bits spur, 64bits for 64bits spur, then (deltaMsecs *
> 1000) will do two different things:
> - in 32 bits it will overflow leading to a negative delay and a
> nextWakeupUsecs in the past (but that's invoking Undefined Behavior)
> - in 64 bits it won't overflow leading to he 6 days delay
>
> Does that explain?
>
> if (deltaMsecs < 0) I would either fail the primitive or set the deltaMsecs
> to zero.
> What do you think?

Hi Nicolas,

Thanks very much for looking into this and for the explanation. I am away
right now so I cannot look at it now, but it does seem likely to be a sign
extension issue at least in part.

If you have a chance to try recompiling with changes to the generated code,
Juan's test condition is just this:

  s := Semaphore new.
  Delay primSignal: s atMilliseconds: Time primMillisecondClock - 10.
  s wait.

This locks the image on 64-bits, and not on 32-bits.

Thanks a lot!
Dave

1234