Hello, I made up a fairly repeatable way to crash the Squeak3D plugin again. Just evaluate the doIt proposed in the image at: http://www.zogotounga.net/swap/crashlab4.zip Things go wrong about every other time on my system, sometimes with a crash dump (one is featured in the archive), sometimes silently. You will have to way about 20 seconds for the animation to reach the critical point (when it stops, but sometimes before). BTW this animation also features a number of glitches - it would be nice to fix that too, but this is another topic. Stef |
Hi Stephane, I first tried with a debug VM on OSX and could not trigger the failure. A fast VM though does easily fail, here is where when launched via lldb: * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x38) frame #0: 0x10df6c75 Squeak3D`b3dAddBackFill(fillList=0x05cff2b8, aFace=0x05cc6b18) at b3dMain.c:994:21 [opt] 991 if(minZ <= (firstFace->minZ + lastFace->minZ) * 0.5) { 992 /* search front to back */ 993 face = firstFace->nextFace; -> 994 while(face->minZ < minZ) face = face->nextFace; 995 } else { 996 /* search back to front */ 997 face = lastFace->prevFace; /* already checked if lastFace->minZ <= minZ */ Looking at full code of function, I do not see any logic error in b3dAddBackFill: if faces are correctly sorted by increasing minZ, the while loop on line 994 should always exit before reaching NULL Though, it seems that we exhaust the face chain without encoutering the condition on line 994 and dereference a NULL pointer. Why only with fast VM? It might be yet another case of Undefined Behavior (UB)... I have thus recompiled the VM with UB sanitizer, and there is indeed some UB reported: ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:1252:29: runtime error: left shift of negative value -760 ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:1254:25: runtime error: left shift of negative value -751 ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:317:33: runtime error: left shift of negative value -802 ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:318:33: runtime error: left shift of negative value -802 ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:316:33: runtime error: left shift of negative value -114 ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:829:61: runtime error: left shift of negative value -2 Though, the instrumented fast VM does not fail... It might be that some aggressive optimizations assuming the absence of UB do not occur with all the instrumentation stuff embedded... So it's not going to be easy to debug. The only way I see is eliminating UB... By protecting those left shift with (unsigned) cast, I also made the crash disappear, which is a good clue. IMO, declaring a left shift of negative int UB is sort of FOOLISH. Yes, some overflow of sign bit by a 0 could happen, EXACTLY like overflow of sign bit by a 1 could happen for positive!!! For me, it should be implementation defined eventually for those exotic machines using sign/magnitude rather than 2-complement. But we cannot easily control what the standard committee decide, we have to cope with... We will have to protect each and every left shift in b3d with a cast... Le mer. 5 févr. 2020 à 16:28, Stéphane Rollandin <[hidden email]> a écrit :
|
> Why only with fast VM? It might be yet another case of Undefined > Behavior (UB)... > I have thus recompiled the VM with UB sanitizer, and there is indeed > some UB reported: > > ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:1252:29: runtime error: > left shift of negative value -760 > ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:1254:25: runtime error: > left shift of negative value -751 > ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:317:33: runtime error: > left shift of negative value -802 > ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:318:33: runtime error: > left shift of negative value -802 > ../../platforms/Cross/plugins/Squeak3D/b3dDraw.c:316:33: runtime error: > left shift of negative value -114 > ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:829:61: runtime error: > left shift of negative value -2 > > Though, the instrumented fast VM does not fail... > It might be that some aggressive optimizations assuming the absence of > UB do not occur with all the instrumentation stuff embedded... This is very dark magic. > IMO, declaring a left shift of negative int UB is sort of FOOLISH. Tell me where to vote and I'll vote for you. > We will have to protect each and every left shift in b3d with a cast... To see a good side in this, stumbling at this point upon this kind of errors must mean the 3D code in itself is quite sound. Indeed I had only a couple of similar crashes for hours of testing (well, playing). What I saw also a couple times, and which is more difficult to report, is the VM hanging at 100% CPU on its core and having to be killed externally. Could it be the same nasal demons at work? Stef |
Le sam. 8 févr. 2020 à 01:35, Stéphane Rollandin <[hidden email]> a écrit :
You cannot yet vote for opinions, except on some social networks ;) > We will have to protect each and every left shift in b3d with a cast... hard to say... I think that you can send a SIGUSR1 to dump stacks, or attach a debugger to running VM.... Unfortunately, I also had another crash: ../../platforms/Cross/plugins/Squeak3D/b3dMain.c:954:43: runtime error: member access within null pointer of type 'B3DPrimitiveFace' (aka 'struct B3DPrimitiveFace') Segmentation fault Sat Feb 8 21:19:37 2020 VM: 202002050212 nicolas@MBP-de-Nicolas:Smalltalk/OpenSmalltalk/opensmalltalk-vm Date: Tue Feb 4 18:12:07 2020 CommitHash: 0f974af6a Plugins: 202002050212 nicolas@MBP-de-Nicolas:Smalltalk/OpenSmalltalk/opensmalltalk-vm C stack backtrace & registers: eax 0x00000018 ebx 0x00000000 ecx 0x040835a8 edx 0x00000000 edi 0x040835a8 esi 0x040835a8 ebp 0xbfeec978 esp 0xbfeec940 eip 0x0f0584b5 0 Squeak3D 0x0f0584b5 b3dAddFrontFill + 118 1 Squeak 0x00275ea4 reportStackState + 870 2 Squeak 0x00276862 sigsegv + 353 3 libsystem_platform.dylib 0xa7dffbae _sigtramp + 46 4 ??? 0xffffffff 0x0 + 4294967295 5 Squeak3D 0x0f05a0ee b3dToggleTopFills + 604 6 Squeak3D 0x0f05cdc0 b3dMainLoop + 7239 7 Squeak3D 0x0f017adb b3dStartRasterizer + 1668
|
Le sam. 8 févr. 2020 à 21:45, Nicolas Cellier <[hidden email]> a écrit :
And I see that this one more closely match your crash.dmp report... So the negativeInt<<shift was another problem created by my clang version on OSX. I ran a few times without crash, so thought it was thru, but your crash is not yet fixed...
|
I have instrumented a bit more the fill list machinery, and here is some logic error I caught: * frame #0: 0x14ad0fee Squeak3D`b3dAbort(msg="Trying to remove a face not in fillList") at b3dMain.c:87:2 [opt] frame #1: 0x14ae774c Squeak3D`b3dRemoveFill(fillList=0x06852cc8, aFace=0x068175a8) at b3dMain.c:938:54 [opt] frame #2: 0x14aecbe2 Squeak3D`b3dMainLoop(state=0x14b195ac, stopReason=0) at b3dMain.c:1379:7 [opt] frame #3: 0x14aa5b43 Squeak3D`b3dStartRasterizer at Squeak3D.c:1701:12 [opt] If we remove a face which is not in the list, then we are going to corrupt the fill list... Where does that happen? 1376 if(leftEdge == lastIntersection) { 1377 /* Special case if this is an intersection edge */ 1378 assert(fillList->firstFace == leftEdge->leftFace); -> 1379 b3dRemoveFill(fillList, leftEdge->rightFace); 1380 b3dAddFrontFill(fillList, leftEdge->rightFace); 1381 } else { Ah, a special case of intersection edge... Why the rightFace would or would not be already in the fillList? Is it really a loop invariant? Hmm, hard to answer without deeper understanding of the whole loop... I have not even an idea of what is left/tight/top face, so no semantic clue. What I suggest as poor man correction is to protect the removal with a if(b3dIsInFillList(fillList,rightFace)) condition... Le sam. 8 févr. 2020 à 21:54, Nicolas Cellier <[hidden email]> a écrit :
|
With instrumentation, I see another instance of removal of absent face from the fill list in b3dToggleTopFills The logic here seems to be that we expect B3D_FACE_ACTIVE flagged face to be on fill list, and we toggle both. So there is another broken invariant. The poor man correction is just a correction of symptoms, not of root cause. I'd prefer the later if ever we can... Le dim. 9 févr. 2020 à 11:20, Nicolas Cellier <[hidden email]> a écrit :
|
So it seems that this second failure related to wrong B3D_FACE_ACTIVE was caused by my own fix. The bug should have disappeared after https://github.com/OpenSmalltalk/opensmalltalk-vm/commit/36a1f1e2ef637347ed3b81a2f4cf8df347e4d803 Without a proper understanding, this must be considered as a "workaround" rather than a proper fix. It means that it solves the symptoms, but maybe not the root cause... I consider that it's nice to have a Squeak3D plugin working. But remember that it's using CPU rather than GPU, so it should presumably be superseded by something more to date. If only we could properly document the algorithm, that would also avoid wild guesses, workarounds and incomplete patches... Le dim. 9 févr. 2020 à 12:38, Nicolas Cellier <[hidden email]> a écrit :
|
> I consider that it's nice to have a Squeak3D plugin working. Yes, because it brilliantly expands the domain of what we can do in Smalltalk only, without any external dependency and in a fully portable way. It allows keeping the ease of development that Squeak provides even while coding an actual 3D game, for example, and although it is not modern, although it is slow and altogether clumsy, it is there, a fine piece among our wonderful Lego collection (MIDI & audio support, BitBlt, Morphic, etc you know the stuff). Stef |
Free forum by Nabble | Edit this page |