BitBltSimulation buffer overrun

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

BitBltSimulation buffer overrun

Nicolas Cellier
 
Hi all,
while running a debug squeak.cog.spur macos x64, I noticed repeated logs in console:

> (((usqInt)sourceIndex)) < endOfSource 2132

This corresponds to failing assert: in BitBltSimulation slang

    srcLongAt: idx
         <inline: #always>
         self assert: idx asUnsignedInteger < endOfSource.
         ^self long32At: idx

Maybe it's a well known bug, but it's easy to trigger if you are after it:
Open Squeak5.1-16548-64bit.image (I took that of nsboot while testing failing Newspeak bootstrap https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/tests/newspeakBootstrap.sh)
Open TestRunner and run all tests...

Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Nicolas Cellier
 
To trigger the failure, just do this:

    BitmapStreamTests new testShortIntegerArrayReadRefStream2.

The source code of BitBlt hack in ShortIntegerArray>>restoreEndianness sounds correct.
The assertion failure occurs when destX < sourceX - presumably a case when hDir = 1 (see checkSourceOverlap).

I used LLDB rather than simulation to find it fast, but now it would be easier to just run VM simulation
Here are the details:
1) I tried LLDB conditional break, but it's awfully slow
2) I modified generated source code and compiled the hardcoded condition:
   2131 /* pick up next word */ if( ((usqInt)sourceIndex) >= endOfSource )
   2132 assert((((usqInt)sourceIndex)) < endOfSource);
3) launch lldb, set the breaskpoint
(lldb) br se -f BitBltPlugin.c -l 2132
(lldb) run ../../image/trunk6-64.image
3) I ran the SUnit tests...

Process 83691 launched: '/Users/nicolas/Smalltalk/OpenSmalltalk/opensmalltalk-vm/build.macos64x64/squeak.cog.spur/SqueakDebug.app/Contents/MacOS/Squeak' (x86_64)
Process 83691 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100186871 Squeak`copyLoop at BitBltPlugin.c:2132
   2129 destMask = mask1;
   2130
   2131 /* pick up next word */ if( ((usqInt)sourceIndex) >= endOfSource )
-> 2132 assert((((usqInt)sourceIndex)) < endOfSource);
   2133 thisWord = long32At(sourceIndex);
   2134 sourceIndex += hInc;
   2135
Target 0: (Squeak) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100186871 Squeak`copyLoop at BitBltPlugin.c:2132
    frame #1: 0x0000000100182b81 Squeak`copyBitsLockedAndClipped at BitBltPlugin.c:1595
    frame #2: 0x000000010017fba6 Squeak`copyBits at BitBltPlugin.c:1257
    frame #3: 0x000000010017fea3 Squeak`primitiveCopyBits at BitBltPlugin.c:5115
    frame #4: 0x000000010000cd2d Squeak`interpret at gcc3x-cointerp.c:6192
    frame #5: 0x0000000100022196 Squeak`enterSmalltalkExecutiveImplementation at gcc3x-cointerp.c:16192
    frame #6: 0x0000000100001657 Squeak`interpret at gcc3x-cointerp.c:2761
    frame #7: 0x000000010015e477 Squeak`-[sqSqueakMainApplication runSqueak](self=0x00000001018091d0, _cmd="runSqueak") at sqSqueakMainApplication.m:201
    frame #8: 0x00007fff41cd87b8 Foundation`__NSFirePerformWithOrder + 360
    frame #9: 0x00007fff3fb4df57 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
    frame #10: 0x00007fff3fb4de7f CoreFoundation`__CFRunLoopDoObservers + 527
    frame #11: 0x00007fff3fb303f8 CoreFoundation`__CFRunLoopRun + 1240
    frame #12: 0x00007fff3fb2fc93 CoreFoundation`CFRunLoopRunSpecific + 483
    frame #13: 0x00007fff3ee1ad96 HIToolbox`RunCurrentEventLoopInMode + 286
    frame #14: 0x00007fff3ee1aa0f HIToolbox`ReceiveNextEventCommon + 366
    frame #15: 0x00007fff3ee1a884 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 64
    frame #16: 0x00007fff3d0caa73 AppKit`_DPSNextEvent + 2085
    frame #17: 0x00007fff3d860e34 AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044
    frame #18: 0x00007fff3d0bf885 AppKit`-[NSApplication run] + 764
    frame #19: 0x00007fff3d08ea72 AppKit`NSApplicationMain + 804
    frame #20: 0x0000000100158086 Squeak`main(argc=2, argv=0x00007ffeefbff910, envp=0x00007ffeefbff928) at main.m:74
    frame #21: 0x00007fff67a5d015 libdyld.dylib`start + 1
(lldb) call printCallStack()
    0x7ffeefbd21e8 I ShortIntegerArray>restoreEndianness 0x1083e30e8: a(n) ShortIntegerArray
    0x7ffeefbd2248 I ReadStream(PositionableStream)>nextWordsInto: 0x108377710: a(n) ReadStream
    0x7ffeefbd2298 I ShortIntegerArray class(ArrayedCollection class)>newFromStream: 0x1089b2bf0: a(n) ShortIntegerArray class
    0x7ffeefbd22f0 M ReferenceStream(DataStream)>readWordLike 0x108377738: a(n) ReferenceStream
    0x7ffeefbd2360 I ReferenceStream(DataStream)>next 0x108377738: a(n) ReferenceStream
    0x7ffeefbd23c8 I ReferenceStream>next 0x108377738: a(n) ReferenceStream
    0x7ffeefbd2420 I BitmapStreamTests>testShortIntegerArrayReadRefStream2 0x1083775e0: a(n) BitmapStreamTests
snip...

(lldb) print skew
(sqInt) $25 = -24
(lldb) print preload
(sqInt) $26 = 1
(lldb) print hDir
(sqInt) $14 = 1

It seems to me that it's got something to do with 64bits shifter...
No time to simulate the VM now, if someone want to take it, it's a good exercize...


Le mar. 3 sept. 2019 à 23:44, Nicolas Cellier <[hidden email]> a écrit :
Hi all,
while running a debug squeak.cog.spur macos x64, I noticed repeated logs in console:

> (((usqInt)sourceIndex)) < endOfSource 2132

This corresponds to failing assert: in BitBltSimulation slang

    srcLongAt: idx
         <inline: #always>
         self assert: idx asUnsignedInteger < endOfSource.
         ^self long32At: idx

Maybe it's a well known bug, but it's easy to trigger if you are after it:
Open Squeak5.1-16548-64bit.image (I took that of nsboot while testing failing Newspeak bootstrap https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/tests/newspeakBootstrap.sh)
Open TestRunner and run all tests...

Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

timrowledge
 


> On 2019-09-04, at 12:48 AM, Nicolas Cellier <[hidden email]> wrote:
>
>
> It seems to me that it's got something to do with 64bits shifter...
> No time to simulate the VM now, if someone want to take it, it's a good exercize...
>

It's highly likely to be a side-effect of the prefetch stuff; we've had 'fun' with this for decades. IIRC there was some real fun on Mac OS years ago if the bitmap memory happened to end just on a page boundary  - or something of that sort.

Basically if the mode involves reading the destination bitmap then we end up with the last pixel trying to read a word *after* the end of the bitmap. Obviously the clipping etc values are supposed to result in those bits (generally a header for the next object in memory of course) getting ignored. It can be amusing if the code is subtly wrong and dodgy bits get into your Forms.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Computer possessed? Try DEVICE=C:\EXOR.SYS


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Nicolas Cellier
 


Le mer. 4 sept. 2019 à 18:57, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 12:48 AM, Nicolas Cellier <[hidden email]> wrote:
>
>
> It seems to me that it's got something to do with 64bits shifter...
> No time to simulate the VM now, if someone want to take it, it's a good exercize...
>

It's highly likely to be a side-effect of the prefetch stuff; we've had 'fun' with this for decades. IIRC there was some real fun on Mac OS years ago if the bitmap memory happened to end just on a page boundary  - or something of that sort.

yes, that happens very rarely, but that might happen and SEGV

Basically if the mode involves reading the destination bitmap then we end up with the last pixel trying to read a word *after* the end of the bitmap. Obviously the clipping etc values are supposed to result in those bits (generally a header for the next object in memory of course) getting ignored. It can be amusing if the code is subtly wrong and dodgy bits get into your Forms.

tim
Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
But we could try and test that now that we know how to trigger it (statistically).

I think that Eliot spent some time to debug and eliminate most overrun cases in 2018 (every BitBltSimulation is stamped eem 2018).
But it's a hell of a low level code mixing a bunch of states with a bunch of branches with a very imperative programming style...
I don't know if the code is provable by a machine, but for a human brain, it's far more above the average capabilities ;)
Maybe it's write only kind, even for Dan himself!


--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
Computer possessed? Try DEVICE=C:\EXOR.SYS


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

timrowledge
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Nicolas Cellier
 



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Nicolas Cellier
 
Hehe, I think I fixed it in VMMaker.oscog-nice.2563
The ultimate tool for understanding and debugging this was pen and paper!
I don't know how young techies will do ;)



Le mer. 4 sept. 2019 à 21:24, Nicolas Cellier <[hidden email]> a écrit :



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Ben Coman
 
Thats a great result Nicolas, after its been outstanding for so long.

+ bbW <= startBits
- bbW < startBits
+ preload := bbW > startBits and: [(m1 bitAnd: mask1) ~= mask1]. "i.e. there are some missing bits"
- preload := (m1 bitAnd: mask1) ~= mask1. "i.e. there are some missing bits"

I nominate you for the Charles Steinmetz Chalk Mark Medal...
https://playingintheworldgame.com/2014/07/19/the-chalk-mark/  

cheers -ben


On Sat, 14 Sep 2019 at 03:03, Nicolas Cellier <[hidden email]> wrote:
 
Hehe, I think I fixed it in VMMaker.oscog-nice.2563
The ultimate tool for understanding and debugging this was pen and paper!
I don't know how young techies will do ;)



Le mer. 4 sept. 2019 à 21:24, Nicolas Cellier <[hidden email]> a écrit :



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Eliot Miranda-2
In reply to this post by Nicolas Cellier
 


On Fri, Sep 13, 2019 at 12:03 PM Nicolas Cellier <[hidden email]> wrote:
 
Hehe, I think I fixed it in VMMaker.oscog-nice.2563
The ultimate tool for understanding and debugging this was pen and paper!
I don't know how young techies will do ;)

:-)  Thank you!!!!!!!
 



Le mer. 4 sept. 2019 à 21:24, Nicolas Cellier <[hidden email]> a écrit :



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood




--
_,,,^..^,,,_
best, Eliot
Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Karl Ramberg
In reply to this post by Ben Coman
 
+1

Best,
Karl

On Sat, Sep 14, 2019 at 2:00 AM Ben Coman <[hidden email]> wrote:
 
Thats a great result Nicolas, after its been outstanding for so long.

+ bbW <= startBits
- bbW < startBits
+ preload := bbW > startBits and: [(m1 bitAnd: mask1) ~= mask1]. "i.e. there are some missing bits"
- preload := (m1 bitAnd: mask1) ~= mask1. "i.e. there are some missing bits"

I nominate you for the Charles Steinmetz Chalk Mark Medal...
https://playingintheworldgame.com/2014/07/19/the-chalk-mark/  

cheers -ben


On Sat, 14 Sep 2019 at 03:03, Nicolas Cellier <[hidden email]> wrote:
 
Hehe, I think I fixed it in VMMaker.oscog-nice.2563
The ultimate tool for understanding and debugging this was pen and paper!
I don't know how young techies will do ;)



Le mer. 4 sept. 2019 à 21:24, Nicolas Cellier <[hidden email]> a écrit :



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

fniephaus
In reply to this post by Nicolas Cellier
 


On Fri, Sep 13, 2019 at 9:03 PM Nicolas Cellier <[hidden email]> wrote:
 
Hehe, I think I fixed it in VMMaker.oscog-nice.2563
The ultimate tool for understanding and debugging this was pen and paper!
I don't know how young techies will do ;)

Great, I ported your fixes to GraalSqueak which contains a Java version of BitBlt. Glad that I could remove a few workarounds to avoid some IndexOutOfBounds errors, but even better that those segfaults are gone.

Well done!

Fabio
 



Le mer. 4 sept. 2019 à 21:24, Nicolas Cellier <[hidden email]> a écrit :



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood


Reply | Threaded
Open this post in threaded view
|

Re: BitBltSimulation buffer overrun

Nicolas Cellier
 
The sole problem is whether a single chalk mark will cover all the possible cases of failures...
Let's cross fingers.

Le lun. 16 sept. 2019 à 18:38, Fabio Niephaus <[hidden email]> a écrit :
 


On Fri, Sep 13, 2019 at 9:03 PM Nicolas Cellier <[hidden email]> wrote:
 
Hehe, I think I fixed it in VMMaker.oscog-nice.2563
The ultimate tool for understanding and debugging this was pen and paper!
I don't know how young techies will do ;)

Great, I ported your fixes to GraalSqueak which contains a Java version of BitBlt. Glad that I could remove a few workarounds to avoid some IndexOutOfBounds errors, but even better that those segfaults are gone.

Well done!

Fabio
 



Le mer. 4 sept. 2019 à 21:24, Nicolas Cellier <[hidden email]> a écrit :



Le mer. 4 sept. 2019 à 20:16, tim Rowledge <[hidden email]> a écrit :
 


> On 2019-09-04, at 11:00 AM, Nicolas Cellier <[hidden email]> wrote:
>
> tim
> Do we ever use the extra word that we read? I don't think so, we don't generate funny artefacts.
> But we could try and test that now that we know how to trigger it (statistically).

We certainly shouldn't use that extra word.

Unless something has gone horribly wrong, the only time this is an issue is on the last fetch of the last row of a BLT. So we *could* consider splitting out the last row and only for that last row include a check for the last word, and only for that word include a 'stick 0 in'.

Yes, possibly, but then one also has to check the first row whenever vDir = -1 (checkSourceOverlap).
I said a hell of a stateful branchful code, but you know it even better than I ;)

Since this is an assert for the debug, perhaps the smart thing to do is make the assert test more thoughtful in some manner. The typical case simply doesn't matter, since all we do is load a wasted word. Where things do need some care is at the end of object memory and page boundaries etc.

tim
--
tim Rowledge; [hidden email]; http://www.rowledge.org/tim
No single raindrop believes it is to blame for the flood