[OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
 

While running the tests (squeak 4.6) on macos32x86, the failure materializes in console with repeated:

select: Invalid argument
errno 22


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW3WAV6ZXUDLONAK7YDQQHFUXA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUFMMUQ", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW3WAV6ZXUDLONAK7YDQQHFUXA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUFMMUQ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
 

I cannot reproduce on my own mac...
workaround: allow failure for this brand, it's a legacy VM, not a showstopper.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW4LUUNQ7TKQHA5GMR3QQIAZHA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECGKZSQ#issuecomment-546090186", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW4LUUNQ7TKQHA5GMR3QQIAZHA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECGKZSQ#issuecomment-546090186", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

The sole instance of perror("select"); is found in platforms/unix/vm/aio.c
(other in minheadless and RiscOS do not count)

That's inside aioPoll(long microSeconds)
What could possibly happen is an invalid timeval...
Either a negative microSeconds or a greater than maximum possible timeout (implementation defined).
The later cannot happen, because microSeconds is a long, and since this is a 32bits VM, the maximum value we can pass is about 2e9 microSeconds (a bit more than 2000 seconds), far from the limit (select man pages tells that the limit is at least 31 days, more than 2e6 seconds!).
Inside the loop, there is a protection against microSeconds <= 0, so the only possibility for an invalid timeval is calling aioPoll with a negative timeout.

Maybe we should inquire if there is not an edge case in cog.v3 delay handling...
It might be possible that primitiveSignalAtMilliseconds(void) ask for a small delay in the past, leading to a negative deltaMsecs that will be wrapped to positive (modulo MillisecondClockMask which is a 29 bits mask), which multiplied by 1000, will lead to about 2^39 microseconds nextWakeupUsecs increment. For example, 1ms in the past will lead to a negative delay:
when wrapped on 32bits long:

(16r1FFFFFFF-1*1000 \\ (1 << 32-1)) highBit -> 32.
16r1FFFFFFF-1*1000 \\ (1 << 32-1) - (1<<32) -> -1876.

Then ioRelinquishProcessorForMicroseconds will invoke aioSleepForUsecs(realTimeToWait) with this negative long realTimeToWait.

That's only a theory that came to my mind by reading code, this has to be confirmed, but it seems to me that we already solved that kind of flaw in the past... Maybe not for cog.v3?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW5YAVXQDTF4XRBC4J3QQLMQ3A5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIDWGI#issuecomment-546323225", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW5YAVXQDTF4XRBC4J3QQLMQ3A5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIDWGI#issuecomment-546323225", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

So, there was a thread from March 2017 explaining the pitfalls in primitiveSignalAtMilliseconds:

Re: [Vm-dev] We need help from VM experts. Re: Freeze after Morph Activity
http://lists.squeakfoundation.org/pipermail/vm-dev/2017-March/024381.html

In particular, all was said here: the rollover workaround is HIGHLY suspiscious:
http://lists.squeakfoundation.org/pipermail/vm-dev/2017-March/024410.html

So we solved it, or IMO we didn't solve, just patched:
http://lists.squeakfoundation.org/pipermail/vm-dev/2017-March/024438.html

As I later reminded, there were still problems:
http://lists.squeakfoundation.org/pipermail/vm-dev/2017-May/025160.html

But our attention was focused on more important developments (spur,win64 etc...) and we patched and replaced usqLong deltaMsecs by sqLong deltaMsecs, about 1 year after 1st problem:
http://lists.squeakfoundation.org/pipermail/vm-dev/2018-March/027041.html

The compiler warning disappeared, but we kept the rollover that I once suggested to remove!
We eliminate all smells of the problem, but not really the problem itself, that remain burried and quiet for some times (that's expected, squeak.cog.v3 might be used occasionally or less).
I'm ready to bet a few hundred $ that we can can still trigger a problem in cog.v3 as Juan did once, and that our CI failures are related. I haven't got an old image handy near this keyboard, but if anyone wants to try this in a squeak.cog.v3, and watchout what the console delivers:

s := Semaphore new.
Delay primSignal: s atMilliseconds: Time primMillisecondClock - 2.
s wait.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW4TCLQXJPGWV6M64UTQQMQHRA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECI55IQ#issuecomment-546430626", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW4TCLQXJPGWV6M64UTQQMQHRA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECI55IQ#issuecomment-546430626", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Hehe, I couldn't resist and tried the snippet on another machine with available Squeak-4.6.image and above snippet just triggers the infinite

select: Invalid argument
errno 22
select: Invalid argument
errno 22
select: Invalid argument
errno 22

So I know the origin (abusive Rollover protection) and I know how to fix.
I just wonder if it is that much useful to spend time on legacy VM...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW2OPCFU7AUZFRCKPT3QQMVEPA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJBJQI#issuecomment-546444481", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW2OPCFU7AUZFRCKPT3QQMVEPA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJBJQI#issuecomment-546444481", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Fix is on its way, first stage at:
http://source.squeak.org/VMMaker/VMMaker.oscog-nice.2572.diff


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEWZ5BT73VPSPZQX3EF3QQNWNRA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJWTWY#issuecomment-546531803", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEWZ5BT73VPSPZQX3EF3QQNWNRA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJWTWY#issuecomment-546531803", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Wait, bad fix, we don't get errno 22 console litany in case of expired delay, but the image hang forever (the Semaphore never get signalled)
http://source.squeak.org/VMMaker/VMMaker.oscog-nice.2573.diff


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW7KZYHSKHXFQDJARBLQQVQTPA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECK3TBA#issuecomment-546683268", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW7KZYHSKHXFQDJARBLQQVQTPA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECK3TBA#issuecomment-546683268", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Ah, I see that the roll-over case was handled at image side already, at the end of handleTimerEvent

...snip...
"Since we have processed all outstanding requests, reset the timing semaphore so
that only new work will wake us up again. Do this RIGHT BEFORE setting the next
wakeup call from the VM because it is only signaled once so we mustn't miss it."
TimingSemaphore initSignals.
Delay primSignal: TimingSemaphore atMilliseconds: nextTick.

"This last test is necessary for the obscure case that the msecs clock rolls over
after nowTick has been computed (unlikely but not impossible). In this case we'd
wait for MillisecondClockMask msecs (roughly six days) or until another delay gets
scheduled (which may not be any time soon). In any case, since handling the
condition is easy, let's just deal with it"
Time millisecondClockValue < nowTick ifTrue:[TimingSemaphore signal]. "retry"

I added the protection to prevent long overflow in ioRelinquishProcessorForMicroseconds.
The protection is not strictly necessary after b80dd63.
However, I see that the SuspendedDelays sorting by resumptionTime will be BAD after a roll-over. Those greater than limit/2 should come before those less than limit/2! This should be handled at the end of handleTimerEvent. These delays might never get signalled...

This does not explain why CI still fails...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW7LBC5BGXLRF7Q6FX3QQWWCNA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLCB4Y#issuecomment-546709747", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW7LBC5BGXLRF7Q6FX3QQWWCNA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLCB4Y#issuecomment-546709747", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Bah, I think I can explain the failures now:
http://source.squeak.org/VMMaker/VMMaker.oscog-nice.2574.diff

I've been blind enough to think in higher level Smalltalk rather than think in lower level C...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW5S7EOY6CLZOXW6QWDQQ4TZFA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECN4DXQ#issuecomment-547078622", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW5S7EOY6CLZOXW6QWDQQ4TZFA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECN4DXQ#issuecomment-547078622", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Thank you, Nicolas!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW5IJAQYZ5DSWYOUIKTQQ4WKNA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECN6I4A#issuecomment-547087472", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW5IJAQYZ5DSWYOUIKTQQ4WKNA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECN6I4A#issuecomment-547087472", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>
Reply | Threaded
Open this post in threaded view
|

Re: [OpenSmalltalk/opensmalltalk-vm] squeak.cog.v3 CI tests fail randomly (#436)

David T Lewis
In reply to this post by David T Lewis
 

Closed #436.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

<script type="application/ld+json">[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW4R4CLYVU54FFXRPVTQQ53TRA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOUPYAXZY#event-2750417895", "url": "https://github.com/OpenSmalltalk/opensmalltalk-vm/issues/436?email_source=notifications\u0026email_token=AIJPEW4R4CLYVU54FFXRPVTQQ53TRA5CNFSM4JEWQPE2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOUPYAXZY#event-2750417895", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script>