Hi All, I see that Float 32-bit word order is big-endian (PowerPC) on all platforms. This is a pain for performance and a pain for code generation in Cog. For example using SSE2 instructions it is trivial to swizzle a PowerPC-layout Float into an xmm register using the PSHUFD SSE2 instruction but tediously verbose to swizzle on write, because one has to swizzle to an xmm register which is hence destructive, which means three instructions (shuffle, write, unshuffle) just to write a Float result. Yes, ok 2 extra instructions is small potatoes, but they're still starch. So I wonder what would the impact be of maintaining Floats in platform order? There are a number of possible solutions.
1. Floats are always in platform order and swizzled on image load when moving from little-endian to big-endian or vice verce. Image code must be rewritten to take the platform's endianness into account. (requires an image rewrite)
2. As for 1 but the image is isolated from the change by providing two primitives, primitiveFloatAt and primitiveFloatAtPut which are implemented with selectors at: basicAt: at:put: and basicAt:put: on Float. These primitives map index 1 onto the most significant word and index 2 onto the least significant word. (requires no image rewrite, but does require a file-in of the four implementations)
3. as for 1 but the image is isolated from the change by providing four primitives primitiveFloatLowWord, primitiveFloatLowWordPut primitiveFloatHighWord & primitiveFloatHighWordPut (requires as much of a rewrite of image code as 1)
4. as per 1 but provide two primitives primitiveFloatBits prmitiveFloatBitsPut which answer or store 64-bit non-negative integers. (requires as much of a rewrite of image code as 1 but is cleaner and scales to 128 bit floats)
5. modify the existing at:[put:] primitives to check for Float receivers, e.g. (and in our Qwaq images Float has a compact class index of 6) from commonVariable:at:cacheIndex: fmt < 8 ifTrue: "Bitmap (& Float!!)"
[(self compactClassIndexOf: oop) == ClassFloatCompactClassIndex ifTrue: [result := self fetchLong32: 2 - index ofObject: rcvr]
ifFalse: [result := self fetchLong32: index - 1 ofObject: rcvr]. ^self positive32BitIntegerFor: result].
This slows down at: access for Bitmap and complicates an already overcomplicated, and performance-critical, primitive 6. eat it. do the swizzling on every float access 6. is apparently painless but actually absurd because we're unnecessarily throwing away performance for no good reason. 5. ditto, not for Float but for Bitmap access (and Bitmap is used in the vm simulator ;) )
2. is my recommendation because it has least effort for adopters of solutions that provide maximum performance Opinions & alternatives? Especially, what are the likely issues of moving to platform Float order?
Best Eliot |
This collide with reverseBytesInImage ? On 18-Apr-09, at 6:15 PM, Eliot Miranda wrote: > Hi All, > > I see that Float 32-bit word order is big-endian (PowerPC) on > all platforms. -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
In reply to this post by Eliot Miranda-2
2009/4/19 Eliot Miranda <[hidden email]>: > > Hi All, > I see that Float 32-bit word order is big-endian (PowerPC) on all platforms. This is a pain for performance and a pain for code generation in Cog. For example using SSE2 instructions it is trivial to swizzle a PowerPC-layout Float into an xmm register using the PSHUFD SSE2 instruction but tediously verbose to swizzle on write, because one has to swizzle to an xmm register which is hence destructive, which means three instructions (shuffle, write, unshuffle) just to write a Float result. Yes, ok 2 extra instructions is small potatoes, but they're still starch. So I wonder what would the impact be of maintaining Floats in platform order? There are a number of possible solutions. > 1. Floats are always in platform order and swizzled on image load when moving from little-endian to big-endian or vice verce. Image code must be rewritten to take the platform's endianness into account. (requires an image rewrite) > 2. As for 1 but the image is isolated from the change by providing two primitives, primitiveFloatAt and primitiveFloatAtPut which are implemented with selectors at: basicAt: at:put: and basicAt:put: on Float. These primitives map index 1 onto the most significant word and index 2 onto the least significant word. (requires no image rewrite, but does require a file-in of the four implementations) > 3. as for 1 but the image is isolated from the change by providing four primitives primitiveFloatLowWord, primitiveFloatLowWordPut primitiveFloatHighWord & primitiveFloatHighWordPut (requires as much of a rewrite of image code as 1) > 4. as per 1 but provide two primitives primitiveFloatBits prmitiveFloatBitsPut which answer or store 64-bit non-negative integers. (requires as much of a rewrite of image code as 1 but is cleaner and scales to 128 bit floats) > 5. modify the existing at:[put:] primitives to check for Float receivers, e.g. (and in our Qwaq images Float has a compact class index of 6) from commonVariable:at:cacheIndex: > fmt < 8 ifTrue: "Bitmap (& Float!!)" > [(self compactClassIndexOf: oop) == ClassFloatCompactClassIndex > ifTrue: [result := self fetchLong32: 2 - index ofObject: rcvr] > ifFalse: [result := self fetchLong32: index - 1 ofObject: rcvr]. > ^self positive32BitIntegerFor: result]. > This slows down at: access for Bitmap and complicates an already overcomplicated, and performance-critical, primitive > 6. eat it. do the swizzling on every float access > > 6. is apparently painless but actually absurd because we're unnecessarily throwing away performance for no good reason. > 5. ditto, not for Float but for Bitmap access (and Bitmap is used in the vm simulator ;) ) > 2. is my recommendation because it has least effort for adopters of solutions that provide maximum performance > Opinions & alternatives? Especially, what are the likely issues of moving to platform Float order? > Best > Eliot > Hmm.. what is the practical use of splitting 32bit float (as well as 64bit) on two words? I think , that from image side, it would be better to treat floats as a black boxes without exposing their bit order anywhere. Then we need just two primitives to serialize/deserialize them in byte array. ByteArray>> floatAt: index bigEndian: boolean ByteArray>> floatAt: index put: floatValue bigEndian: boolean (note, endianesness should be provided explicitly). P.S. I am for the swizzling at image load. -- Best regards, Igor Stasenko AKA sig. |
In reply to this post by johnmci
On Sat, Apr 18, 2009 at 6:30 PM, John M McIntosh <[hidden email]> wrote:
yes, but easy to fix: (fmt = 6 and: [BytesPerWord = 8])
ifTrue: ["Object contains 32-bit half-words packed into 64-bit machine words."
wordAddr := oop + BaseHeaderSize. self reverseWordsFrom: wordAddr to: oop + (self sizeBitsOf: oop)]].
=> (fmt = 6 ifTrue: [(self fetchClassOfNonInt: oop) = floatClass
ifTrue: [self swapWordFrom: oop + BaseHeaderSize to: oop + BaseHeaderSize + 8] ifFalse: [BytesPerWord = 8]) ifTrue:
["Object contains 32-bit half-words packed into 64-bit machine words."
wordAddr := oop + BaseHeaderSize.
self reverseWordsFrom: wordAddr to: oop + (self sizeBitsOf: oop)]]].
(BTW, is reverseWordsFrom:to: broken for 64-bit images?)
|
In reply to this post by Eliot Miranda-2
On Sat, 2009-04-18 at 18:15 -0700, Eliot Miranda wrote: > Hi All, > > > I see that Float 32-bit word order is big-endian (PowerPC) on all > platforms. This is a pain for performance and a pain for code > generation in Cog. For example using SSE2 instructions it is trivial > to swizzle a PowerPC-layout Float into an xmm register using the > PSHUFD SSE2 instruction but tediously verbose to swizzle on write, > because one has to swizzle to an xmm register which is hence > destructive, which means three instructions (shuffle, write, > unshuffle) just to write a Float result. Yes, ok 2 extra instructions > is small potatoes, but they're still starch. So I wonder what would > the impact be of maintaining Floats in platform order? There are a > number of possible solutions. > > > 1. Floats are always in platform order and swizzled on image load when > moving from little-endian to big-endian or vice verce. Image code > must be rewritten to take the platform's endianness into account. > (requires an image rewrite) > > > 2. As for 1 but the image is isolated from the change by providing > two primitives, primitiveFloatAt and primitiveFloatAtPut which are > implemented with selectors at: basicAt: at:put: and basicAt:put: on > Float. These primitives map index 1 onto the most significant word > and index 2 onto the least significant word. (requires no image > rewrite, but does require a file-in of the four implementations) I'd like to see Floats stored in native format too. Don't forget about the 32 bit floats in Float arrays. Bryce |
On Sun, Apr 19, 2009 at 6:43 AM, Bryce Kampjes <[hidden email]> wrote:
Tell me more :) Are these in some funky order, or are they just IEEE single precision in platform order?
|
On Sun, Apr 19, 2009 at 07:57:20AM -0700, Eliot Miranda wrote: > > On Sun, Apr 19, 2009 at 6:43 AM, Bryce Kampjes <[hidden email]>wrote: > > > > > On Sat, 2009-04-18 at 18:15 -0700, Eliot Miranda wrote: > > > Hi All, > > > > > > > > > I see that Float 32-bit word order is big-endian (PowerPC) on all > > > platforms. This is a pain for performance and a pain for code > > > generation in Cog. For example using SSE2 instructions it is trivial > > > to swizzle a PowerPC-layout Float into an xmm register using the > > > PSHUFD SSE2 instruction but tediously verbose to swizzle on write, > > > because one has to swizzle to an xmm register which is hence > > > destructive, which means three instructions (shuffle, write, > > > unshuffle) just to write a Float result. Yes, ok 2 extra instructions > > > is small potatoes, but they're still starch. So I wonder what would > > > the impact be of maintaining Floats in platform order? There are a > > > number of possible solutions. > > > > > > > > > 1. Floats are always in platform order and swizzled on image load when > > > moving from little-endian to big-endian or vice verce. Image code > > > must be rewritten to take the platform's endianness into account. > > > (requires an image rewrite) > > > > > > > > > 2. As for 1 but the image is isolated from the change by providing > > > two primitives, primitiveFloatAt and primitiveFloatAtPut which are > > > implemented with selectors at: basicAt: at:put: and basicAt:put: on > > > Float. These primitives map index 1 onto the most significant word > > > and index 2 onto the least significant word. (requires no image > > > rewrite, but does require a file-in of the four implementations) > > > > I'd like to see Floats stored in native format too. Don't forget about > > the 32 bit floats in Float arrays. > > > Tell me more :) Are these in some funky order, or are they just IEEE single > precision in platform order? an Intel box, with hex printouts of the contents of an IntegerArray and a FloatArray (note, OopPlugin is a utility that I use for accessing the internals of object memory slots in the real object memory). This shows the internal storage of float values in a FloatArray. I poked various values into the array so you can see where they are stored in the 64-bit object memory words. The values in a FloatArray are 32-bit floats, packed into 64-bit slots in the object memory. There are no endian issues to worry about. On both 32-bit and 64-bit object memories, the values are arranged in the order of an (int *) access. In other words, they are arrays of 32-bit values that just happen to be stuffed onto slots that the object memory thinks are 64-bit words. Of course, storage of 32-bit floats in FloatArray is unrelated to the original topic of Float swizzling. > (BTW, is reverseWordsFrom:to: broken for 64-bit images?) As far as I know, there are no problems with this. The original 64-bit image was done on a big-endian box, and decendants of that image are running on my little-endian box today, so #reverseWordsFrom:to: must have worked. Dave world.png (70K) Download Attachment |
On 19-Apr-09, at 8:37 AM, David T. Lewis wrote: >> The values in a FloatArray are 32-bit floats, packed into 64-bit >> slots > in the object memory. There are no endian issues to worry about. On > both > 32-bit and 64-bit object memories, the values are arranged in the > order > of an (int *) access. In other words, they are arrays of 32-bit values > that just happen to be stuffed onto slots that the object memory > thinks > are 64-bit words. Well that's not quite true, you have to be careful here because might people move data in and out of the FloatArray, but let's see.. MatrixTransform2x3>>at: index put: value <primitive: 'primitiveAtPut' module: 'FloatArrayPlugin'> value isFloat ifTrue:[self basicAt: index put: value asIEEE32BitWord] ifFalse:[self at: index put: value asFloat]. ^value CGPoint>>x: aValue self unsignedLongAt: 1 put: aValue asFloat asIEEE32BitWord bigEndian: SmalltalkImage current isBigEndian. Ok, well the reverseBytesInImage logic I'll assume without looking is swapping the bytes in the FloatArray at load time so that accessors use SmalltalkImage current isBigEndian to move data in/out in the proper form. -- = = = ======================================================================== John M. McIntosh <[hidden email]> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com = = = ======================================================================== |
On Sun, Apr 19, 2009 at 11:08:14AM -0700, John M McIntosh wrote: > > On 19-Apr-09, at 8:37 AM, David T. Lewis wrote: > > >>The values in a FloatArray are 32-bit floats, packed into 64-bit slots > >in the object memory. There are no endian issues to worry about. On both > >32-bit and 64-bit object memories, the values are arranged in the order > >of an (int *) access. In other words, they are arrays of 32-bit values > >that just happen to be stuffed onto slots that the object memory thinks > >are 64-bit words. > > Well that's not quite true, you have to be careful here because might > people move data in and out of > the FloatArray, but let's see.. > > > MatrixTransform2x3>>at: index put: value > <primitive: 'primitiveAtPut' module: 'FloatArrayPlugin'> > value isFloat > ifTrue:[self basicAt: index put: value asIEEE32BitWord] > ifFalse:[self at: index put: value asFloat]. > ^value > > > CGPoint>>x: aValue > self unsignedLongAt: 1 put: aValue asFloat asIEEE32BitWord > bigEndian: SmalltalkImage current isBigEndian. As near as I can tell all accesses to FloatArray and IntegerArray are on 32 bit boundaries for both 32-bit and 64-bit images, and are not impacted by host endianness. I should mention that I have not tried FloatArrayPlugin on 64-bit images; I should probably have a look at that one of these days. > Ok, well the reverseBytesInImage logic I'll assume without looking is > swapping the bytes in the FloatArray at load time so that accessors > use SmalltalkImage current isBigEndian to move data in/out in the > proper form. Yes, the bytes in a FloatArray would be swapped at load time if moving from one endianness to another, but no I don't think that #isBigEndian is required for accessing the ints or floats on 32 bit boundaries. Also, a 64-bit image containing FloatArray or IntegerArray instances should be correctly byte swapped when moved from one endianness to another, although I have never actually tried it so I can't say for sure. Bottom line: This stuff pretty much just works, no special cases to worry about. Dave |
Free forum by Nabble | Edit this page |