VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

timfelgentreff
Hi,

I've pushed a change to the BitBlt simulation code to the Inbox (VMMaker-tfel.358), because I didn't know where else to put it. With these changes, we are able to run a current 4.5 image with VMMaker loaded on our RSqueakVM, with BitBlt entirely run from within the image.

The goal is to have the VM run as many plugins as possible from pure Smalltalk, so there will be more slight changes and maybe the odd performance improvement for the simulation forthcoming. Is this something that would be ok with everyone?

Here's the diff:

BitBltSimulation>>loadColorMap: (changed)
loadColorMap
        "ColorMap, if not nil, must be longWords, and
        2^N long, where N = sourceDepth for 1, 2, 4, 8 bits,
        or N = 9, 12, or 15 (3, 4, 5 bits per color) for 16 or 32 bits."
        | cmSize oldStyle oop cmOop |
        <inline: true>
        cmFlags := cmMask := cmBitsPerColor := 0.
        cmShiftTable := nil.
        cmMaskTable := nil.
        cmLookupTable := nil.
        cmOop := interpreterProxy fetchPointer: BBColorMapIndex ofObject: bitBltOop.
        cmOop = interpreterProxy nilObject
                ifTrue: [^ true].
        cmFlags := ColorMapPresent.
        "even if identity or somesuch - may be cleared later"
        oldStyle := false.
        (interpreterProxy isWords: cmOop)
                ifTrue: ["This is an old-style color map (indexed only, with implicit
                        RGBA conversion)"
                        cmSize := interpreterProxy slotSizeOf: cmOop.
                        cmLookupTable := interpreterProxy firstIndexableField: cmOop.
- oldStyle := true.
- self
- cCode: ''
- inSmalltalk: [self assert: cmLookupTable unitSize = 4]]
+ oldStyle := true]
                ifFalse: ["A new-style color map (fully qualified)"
                        ((interpreterProxy isPointers: cmOop)
                                        and: [(interpreterProxy slotSizeOf: cmOop)
                                                        >= 3])
                                ifFalse: [^ false].
                        cmShiftTable := self
                                                loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 0 ofObject: cmOop).
                        cmMaskTable := self
                                                loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 1 ofObject: cmOop).
                        oop := interpreterProxy fetchPointer: 2 ofObject: cmOop.
                        oop = interpreterProxy nilObject
                                ifTrue: [cmSize := 0]
                                ifFalse: [(interpreterProxy isWords: oop)
                                                ifFalse: [^ false].
                                        cmSize := interpreterProxy slotSizeOf: oop.
                                        cmLookupTable := interpreterProxy firstIndexableField: oop].
                        cmFlags := cmFlags bitOr: ColorMapNewStyle.
                        self
                                cCode: ''
                                inSmalltalk: [self assert: cmShiftTable unitSize = 4.
                                        self assert: cmMaskTable unitSize = 4.
                                        self assert: cmLookupTable unitSize = 4]].
        (cmSize bitAnd: cmSize - 1)
                        = 0
                ifFalse: [^ false].
        cmMask := cmSize - 1.
        cmBitsPerColor := 0.
        cmSize = 512
                ifTrue: [cmBitsPerColor := 3].
        cmSize = 4096
                ifTrue: [cmBitsPerColor := 4].
        cmSize = 32768
                ifTrue: [cmBitsPerColor := 5].
        cmSize = 0
                ifTrue: [cmLookupTable := nil.
                        cmMask := 0]
                ifFalse: [cmFlags := cmFlags bitOr: ColorMapIndexedPart].
        oldStyle
                ifTrue: ["needs implicit conversion"
                        self setupColorMasks].
        "Check if colorMap is just identity mapping for RGBA parts"
        (self isIdentityMap: cmShiftTable with: cmMaskTable)
                ifTrue: [cmMaskTable := nil.
                        cmShiftTable := nil]
                ifFalse: [cmFlags := cmFlags bitOr: ColorMapFixedPart].
        ^ true


BitBltSimulator>>halftoneAt: (added)
+halftoneAt: idx
+ ^ halftoneBase + (idx \\ halftoneHeight * 4) long32At: 0


Reply | Threaded
Open this post in threaded view
|

Re: VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

Eliot Miranda-2
 
Hi Tim,

On Thu, Feb 12, 2015 at 8:55 AM, timfelgentreff <[hidden email]> wrote:

Hi,

I've pushed a change to the BitBlt simulation code to the Inbox
(VMMaker-tfel.358), because I didn't know where else to put it. With these
changes, we are able to run a current 4.5 image with VMMaker loaded on our
RSqueakVM, with BitBlt entirely run from within the image.

The goal is to have the VM run as many plugins as possible from pure
Smalltalk, so there will be more slight changes and maybe the odd
performance improvement for the simulation forthcoming. Is this something
that would be ok with everyone?

It's certainly good for me; thanks.  Since you're looking at BitBlt code let me try and rope you in to a problem I'm having with 64-bit Spur.  Right now a number of tests fail because of byte-swapping of bits data, e.g. ShortIntegerArray, failing on 64-bit Spur.  This is done with BitBlt.  See ShortIntegerArray>>restoreEndianness.  Apart from the fact that this is an absurd way to do things (*) it should work and right now doesn't.  Would you be interested in taking a look at it and trying to figure out why?  If you're interested you'll need a 64-bit linux for the real VM, and I'll put together a simulator image and a 64-bit test image for you to play with.

(*) more generally a) using 6 bitblt invocations instead of a single byte reversal primitive is...um, diplomatically, a waste of cycles, but more seriously, b) we're paying for needless byte reversals to keep things in big-endian format.  Little endian has essentially won with most ARM deployments being little endian and x86 & x64 being little endian.  SHouldn't we be looking to eliminate all this unnecessary overhead?  It's in image segment load/store, sound processing, and its unnecessary.


Here's the diff:

BitBltSimulation>>loadColorMap: (changed)
loadColorMap
        "ColorMap, if not nil, must be longWords, and
        2^N long, where N = sourceDepth for 1, 2, 4, 8 bits,
        or N = 9, 12, or 15 (3, 4, 5 bits per color) for 16 or 32 bits."
        | cmSize oldStyle oop cmOop |
        <inline: true>
        cmFlags := cmMask := cmBitsPerColor := 0.
        cmShiftTable := nil.
        cmMaskTable := nil.
        cmLookupTable := nil.
        cmOop := interpreterProxy fetchPointer: BBColorMapIndex ofObject:
bitBltOop.
        cmOop = interpreterProxy nilObject
                ifTrue: [^ true].
        cmFlags := ColorMapPresent.
        "even if identity or somesuch - may be cleared later"
        oldStyle := false.
        (interpreterProxy isWords: cmOop)
                ifTrue: ["This is an old-style color map (indexed only, with implicit
                        RGBA conversion)"
                        cmSize := interpreterProxy slotSizeOf: cmOop.
                        cmLookupTable := interpreterProxy firstIndexableField: cmOop.
-                       oldStyle := true.
-                       self
-                               cCode: ''
-                               inSmalltalk: [self assert: cmLookupTable unitSize = 4]]
+                       oldStyle := true]
                ifFalse: ["A new-style color map (fully qualified)"
                        ((interpreterProxy isPointers: cmOop)
                                        and: [(interpreterProxy slotSizeOf: cmOop)
                                                        >= 3])
                                ifFalse: [^ false].
                        cmShiftTable := self
                                                loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 0
ofObject: cmOop).
                        cmMaskTable := self
                                                loadColorMapShiftOrMaskFrom: (interpreterProxy fetchPointer: 1
ofObject: cmOop).
                        oop := interpreterProxy fetchPointer: 2 ofObject: cmOop.
                        oop = interpreterProxy nilObject
                                ifTrue: [cmSize := 0]
                                ifFalse: [(interpreterProxy isWords: oop)
                                                ifFalse: [^ false].
                                        cmSize := interpreterProxy slotSizeOf: oop.
                                        cmLookupTable := interpreterProxy firstIndexableField: oop].
                        cmFlags := cmFlags bitOr: ColorMapNewStyle.
                        self
                                cCode: ''
                                inSmalltalk: [self assert: cmShiftTable unitSize = 4.
                                        self assert: cmMaskTable unitSize = 4.
                                        self assert: cmLookupTable unitSize = 4]].
        (cmSize bitAnd: cmSize - 1)
                        = 0
                ifFalse: [^ false].
        cmMask := cmSize - 1.
        cmBitsPerColor := 0.
        cmSize = 512
                ifTrue: [cmBitsPerColor := 3].
        cmSize = 4096
                ifTrue: [cmBitsPerColor := 4].
        cmSize = 32768
                ifTrue: [cmBitsPerColor := 5].
        cmSize = 0
                ifTrue: [cmLookupTable := nil.
                        cmMask := 0]
                ifFalse: [cmFlags := cmFlags bitOr: ColorMapIndexedPart].
        oldStyle
                ifTrue: ["needs implicit conversion"
                        self setupColorMasks].
        "Check if colorMap is just identity mapping for RGBA parts"
        (self isIdentityMap: cmShiftTable with: cmMaskTable)
                ifTrue: [cmMaskTable := nil.
                        cmShiftTable := nil]
                ifFalse: [cmFlags := cmFlags bitOr: ColorMapFixedPart].
        ^ true


BitBltSimulator>>halftoneAt: (added)
+halftoneAt: idx
+       ^ halftoneBase + (idx \\ halftoneHeight * 4) long32At: 0






--
View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-tp4805362.html
Sent from the Squeak VM mailing list archive at Nabble.com.



--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

timfelgentreff
I'm interested in getting 64bit to run properly also for the RSqueakVM. I'll see if I can make time to investigate.
Reply | Threaded
Open this post in threaded view
|

Re: VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

timfelgentreff
I've pushed another update to the BitBltSimulator to the inbox as VMMaker-tfel.359. This makes initialiseModule be called only once when we're simulating an entire image, which makes the simulation of various BitBlt operations aroung 200x faster for me on Cog, and around 500x faster on RSqueakVM. This is only in the Simulator class, so it won't affect the plugin. Can someone take a look and if it's ok move it to the VMMaker repository?
Reply | Threaded
Open this post in threaded view
|

Re: VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

Eliot Miranda-2
 
Hi Tim,

On Wed, Mar 11, 2015 at 10:47 AM, timfelgentreff <[hidden email]> wrote:

I've pushed another update to the BitBltSimulator to the inbox as
VMMaker-tfel.359. This makes initialiseModule be called only once when we're
simulating an entire image, which makes the simulation of various BitBlt
operations aroung 200x faster for me on Cog, and around 500x faster on
RSqueakVM. This is only in the Simulator class, so it won't affect the
plugin. Can someone take a look and if it's ok move it to the VMMaker
repository?

Thanks!  How does initialiseModule get called so often?  I don't see how this happens in the Cog simulator.  AFAICT initialiseModule gets called once when the plugin is loaded.  What am I missing?

Also, could you explain the changes in VMMaker-tfel.358?  What was the bug?

Great to have you on board!
 



--
View this message in context: http://forum.world.st/VMMaker-tfel-358-in-Inbox-Fixes-for-BitBlt-simulation-tp4805362p4811280.html
Sent from the Squeak VM mailing list archive at Nabble.com.



--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: VMMaker-tfel.358 in Inbox: Fixes for BitBlt simulation

timfelgentreff
In reply to this post by Eliot Miranda-2
Hi Eliot,

The bugs in 358 were:

for loadColorMap:, that the assertion was simply failing. Since in C, no assertion is generated, Tobias and I figured it may simply be not needed, and everything seemed to work without.

and halftoneAt: was simply missing, BitBltSimulator overrides dstLongAt: and srcLongAt:, but didn't include an override for halftoneAt:, and we ran into a debugger when trying to simulate. Adding this method fixes that.

Regarding how initaliseModule was called so often - we're using the code in the plugins a little unconventionally. We have a VM without the BitBlt plugin and when the named primitive comes up, we instead dispatch to BitBlt>>copyBitsSimulated, and then simulate _only_ the BitBlt part, not the entire image. But that entails creating a new InterpreterProxy and initalising it from current context, and and thus also creating a new instance of the BitBltSimulator. That's how initialiseModule ends up being called often. 359 doesn't remove those calls, it just caches those constant tables on the class side.

I've just pushed VMMaker-tfel.360 to the inbox, which adds methods so we can do the same with Balloon (BalloonEngine gains a #simulateBalloonPrimitive:args:). The idea is that we can run the VM without BitBlt and Balloon plugins and just run the Slang code (on RSqueakVM with the changes from 359, we get about 50% of the BitBlt performance running the Simulation compared to the C plugin)