float word order

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

float word order

Eliot Miranda-2
 
Hi All,

    I see that Float 32-bit word order is big-endian (PowerPC) on all platforms.  This is a pain for performance and a pain for code generation in Cog.  For example using SSE2 instructions it is trivial to swizzle a PowerPC-layout Float into an xmm register using the PSHUFD SSE2 instruction but tediously verbose to swizzle on write, because one has to swizzle to an xmm register which is hence destructive, which means three instructions (shuffle, write, unshuffle) just to write a Float result.  Yes, ok 2 extra instructions is small potatoes, but they're still starch.  So I wonder what would the impact be of maintaining Floats in platform order?  There are a number of possible solutions.

1. Floats are always in platform order and swizzled on image load when moving from little-endian to big-endian or vice verce.  Image code must be rewritten to take the platform's endianness into account. (requires an image rewrite)

2.  As for 1 but the image is isolated from the change by providing two primitives, primitiveFloatAt and primitiveFloatAtPut which are implemented with selectors at: basicAt: at:put: and basicAt:put: on Float.  These primitives map index 1 onto the most significant word and index 2 onto the least significant word.  (requires no image rewrite, but does require a file-in of the four implementations)

3. as for 1 but the image is isolated from the change by providing four primitives primitiveFloatLowWord, primitiveFloatLowWordPut primitiveFloatHighWord & primitiveFloatHighWordPut (requires as much of a rewrite of image code as 1)

4. as per 1 but provide two primitives primitiveFloatBits prmitiveFloatBitsPut which answer or store 64-bit non-negative integers. (requires as much of a rewrite of image code as 1 but is cleaner and scales to 128 bit floats)

5. modify the existing at:[put:] primitives to check for Float receivers, e.g. (and in our Qwaq images Float has a compact class index of 6) from commonVariable:at:cacheIndex:
fmt < 8 ifTrue:  "Bitmap (& Float!!)"
[(self compactClassIndexOf: oop) == ClassFloatCompactClassIndex
                            ifTrue: [result := self fetchLong32: 2 - index ofObject: rcvr]
                            ifFalse: [result := self fetchLong32: index - 1 ofObject: rcvr].
^self positive32BitIntegerFor: result].
This slows down at: access for Bitmap and complicates an already overcomplicated, and performance-critical, primitive

6. eat it.  do the swizzling on every float access


6. is apparently painless but actually absurd because we're unnecessarily throwing away performance for no good reason.

5. ditto, not for Float but for Bitmap access (and Bitmap is used in the vm simulator ;) )

2. is my recommendation because it has least effort for adopters of solutions that provide maximum performance

Opinions & alternatives?  Especially, what are the likely issues of moving to platform Float order?

Best
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: float word order

johnmci
 
This collide with reverseBytesInImage ?

On 18-Apr-09, at 6:15 PM, Eliot Miranda wrote:

> Hi All,
>
>     I see that Float 32-bit word order is big-endian (PowerPC) on  
> all platforms.

--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================



Reply | Threaded
Open this post in threaded view
|

Re: float word order

Igor Stasenko
In reply to this post by Eliot Miranda-2

2009/4/19 Eliot Miranda <[hidden email]>:

>
> Hi All,
>     I see that Float 32-bit word order is big-endian (PowerPC) on all platforms.  This is a pain for performance and a pain for code generation in Cog.  For example using SSE2 instructions it is trivial to swizzle a PowerPC-layout Float into an xmm register using the PSHUFD SSE2 instruction but tediously verbose to swizzle on write, because one has to swizzle to an xmm register which is hence destructive, which means three instructions (shuffle, write, unshuffle) just to write a Float result.  Yes, ok 2 extra instructions is small potatoes, but they're still starch.  So I wonder what would the impact be of maintaining Floats in platform order?  There are a number of possible solutions.
> 1. Floats are always in platform order and swizzled on image load when moving from little-endian to big-endian or vice verce.  Image code must be rewritten to take the platform's endianness into account. (requires an image rewrite)
> 2.  As for 1 but the image is isolated from the change by providing two primitives, primitiveFloatAt and primitiveFloatAtPut which are implemented with selectors at: basicAt: at:put: and basicAt:put: on Float.  These primitives map index 1 onto the most significant word and index 2 onto the least significant word.  (requires no image rewrite, but does require a file-in of the four implementations)
> 3. as for 1 but the image is isolated from the change by providing four primitives primitiveFloatLowWord, primitiveFloatLowWordPut primitiveFloatHighWord & primitiveFloatHighWordPut (requires as much of a rewrite of image code as 1)
> 4. as per 1 but provide two primitives primitiveFloatBits prmitiveFloatBitsPut which answer or store 64-bit non-negative integers. (requires as much of a rewrite of image code as 1 but is cleaner and scales to 128 bit floats)
> 5. modify the existing at:[put:] primitives to check for Float receivers, e.g. (and in our Qwaq images Float has a compact class index of 6) from commonVariable:at:cacheIndex:
> fmt < 8 ifTrue:  "Bitmap (& Float!!)"
> [(self compactClassIndexOf: oop) == ClassFloatCompactClassIndex
>                             ifTrue: [result := self fetchLong32: 2 - index ofObject: rcvr]
>                             ifFalse: [result := self fetchLong32: index - 1 ofObject: rcvr].
> ^self positive32BitIntegerFor: result].
> This slows down at: access for Bitmap and complicates an already overcomplicated, and performance-critical, primitive
> 6. eat it.  do the swizzling on every float access
>
> 6. is apparently painless but actually absurd because we're unnecessarily throwing away performance for no good reason.
> 5. ditto, not for Float but for Bitmap access (and Bitmap is used in the vm simulator ;) )
> 2. is my recommendation because it has least effort for adopters of solutions that provide maximum performance
> Opinions & alternatives?  Especially, what are the likely issues of moving to platform Float order?
> Best
> Eliot
>

Hmm.. what is the practical use of splitting 32bit float (as well as
64bit) on two words?
I think , that from image side, it would be better to treat floats as
a black boxes without exposing their bit order anywhere.
Then we need just two primitives to serialize/deserialize them in byte array.
ByteArray>> floatAt: index bigEndian: boolean
ByteArray>> floatAt: index put: floatValue bigEndian: boolean
(note, endianesness should be provided explicitly).

P.S. I am for the swizzling at image load.

--
Best regards,
Igor Stasenko AKA sig.
Reply | Threaded
Open this post in threaded view
|

Re: float word order

Eliot Miranda-2
In reply to this post by johnmci
 


On Sat, Apr 18, 2009 at 6:30 PM, John M McIntosh <[hidden email]> wrote:

This collide with reverseBytesInImage ?

yes, but easy to fix:

(fmt = 6 and: [BytesPerWord = 8])
ifTrue: ["Object contains 32-bit half-words packed into 64-bit machine words."
wordAddr := oop + BaseHeaderSize.
self reverseWordsFrom: wordAddr to: oop + (self sizeBitsOf: oop)]].
=>
(fmt = 6 ifTrue:
[(self fetchClassOfNonInt: oop) = floatClass
ifTrue:
[self swapWordFrom: oop + BaseHeaderSize to: oop + BaseHeaderSize + 8]
ifFalse:
[BytesPerWord = 8]) ifTrue:
["Object contains 32-bit half-words packed into 64-bit machine words."
wordAddr := oop + BaseHeaderSize.
self reverseWordsFrom: wordAddr to: oop + (self sizeBitsOf: oop)]]].
 
(BTW, is reverseWordsFrom:to: broken for 64-bit images?)



On 18-Apr-09, at 6:15 PM, Eliot Miranda wrote:

Hi All,

   I see that Float 32-bit word order is big-endian (PowerPC) on all platforms.

--
===========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
===========================================================================




Reply | Threaded
Open this post in threaded view
|

Re: float word order

Bryce Kampjes
In reply to this post by Eliot Miranda-2
 
On Sat, 2009-04-18 at 18:15 -0700, Eliot Miranda wrote:

> Hi All,
>
>
>     I see that Float 32-bit word order is big-endian (PowerPC) on all
> platforms.  This is a pain for performance and a pain for code
> generation in Cog.  For example using SSE2 instructions it is trivial
> to swizzle a PowerPC-layout Float into an xmm register using the
> PSHUFD SSE2 instruction but tediously verbose to swizzle on write,
> because one has to swizzle to an xmm register which is hence
> destructive, which means three instructions (shuffle, write,
> unshuffle) just to write a Float result.  Yes, ok 2 extra instructions
> is small potatoes, but they're still starch.  So I wonder what would
> the impact be of maintaining Floats in platform order?  There are a
> number of possible solutions.
>
>
> 1. Floats are always in platform order and swizzled on image load when
> moving from little-endian to big-endian or vice verce.  Image code
> must be rewritten to take the platform's endianness into account.
> (requires an image rewrite)
>
>
> 2.  As for 1 but the image is isolated from the change by providing
> two primitives, primitiveFloatAt and primitiveFloatAtPut which are
> implemented with selectors at: basicAt: at:put: and basicAt:put: on
> Float.  These primitives map index 1 onto the most significant word
> and index 2 onto the least significant word.  (requires no image
> rewrite, but does require a file-in of the four implementations)

I'd like to see Floats stored in native format too.  Don't forget about
the 32 bit floats in Float arrays.

Bryce

Reply | Threaded
Open this post in threaded view
|

Re: float word order

Eliot Miranda-2
 


On Sun, Apr 19, 2009 at 6:43 AM, Bryce Kampjes <[hidden email]> wrote:

On Sat, 2009-04-18 at 18:15 -0700, Eliot Miranda wrote:
> Hi All,
>
>
>     I see that Float 32-bit word order is big-endian (PowerPC) on all
> platforms.  This is a pain for performance and a pain for code
> generation in Cog.  For example using SSE2 instructions it is trivial
> to swizzle a PowerPC-layout Float into an xmm register using the
> PSHUFD SSE2 instruction but tediously verbose to swizzle on write,
> because one has to swizzle to an xmm register which is hence
> destructive, which means three instructions (shuffle, write,
> unshuffle) just to write a Float result.  Yes, ok 2 extra instructions
> is small potatoes, but they're still starch.  So I wonder what would
> the impact be of maintaining Floats in platform order?  There are a
> number of possible solutions.
>
>
> 1. Floats are always in platform order and swizzled on image load when
> moving from little-endian to big-endian or vice verce.  Image code
> must be rewritten to take the platform's endianness into account.
> (requires an image rewrite)
>
>
> 2.  As for 1 but the image is isolated from the change by providing
> two primitives, primitiveFloatAt and primitiveFloatAtPut which are
> implemented with selectors at: basicAt: at:put: and basicAt:put: on
> Float.  These primitives map index 1 onto the most significant word
> and index 2 onto the least significant word.  (requires no image
> rewrite, but does require a file-in of the four implementations)

I'd like to see Floats stored in native format too.  Don't forget about
the 32 bit floats in Float arrays.

Tell me more :)  Are these in some funky order, or are they just IEEE single precision in platform order?
 


Bryce


Reply | Threaded
Open this post in threaded view
|

Re: float word order

David T. Lewis
 
On Sun, Apr 19, 2009 at 07:57:20AM -0700, Eliot Miranda wrote:

>  
> On Sun, Apr 19, 2009 at 6:43 AM, Bryce Kampjes <[hidden email]>wrote:
>
> >
> > On Sat, 2009-04-18 at 18:15 -0700, Eliot Miranda wrote:
> > > Hi All,
> > >
> > >
> > >     I see that Float 32-bit word order is big-endian (PowerPC) on all
> > > platforms.  This is a pain for performance and a pain for code
> > > generation in Cog.  For example using SSE2 instructions it is trivial
> > > to swizzle a PowerPC-layout Float into an xmm register using the
> > > PSHUFD SSE2 instruction but tediously verbose to swizzle on write,
> > > because one has to swizzle to an xmm register which is hence
> > > destructive, which means three instructions (shuffle, write,
> > > unshuffle) just to write a Float result.  Yes, ok 2 extra instructions
> > > is small potatoes, but they're still starch.  So I wonder what would
> > > the impact be of maintaining Floats in platform order?  There are a
> > > number of possible solutions.
> > >
> > >
> > > 1. Floats are always in platform order and swizzled on image load when
> > > moving from little-endian to big-endian or vice verce.  Image code
> > > must be rewritten to take the platform's endianness into account.
> > > (requires an image rewrite)
> > >
> > >
> > > 2.  As for 1 but the image is isolated from the change by providing
> > > two primitives, primitiveFloatAt and primitiveFloatAtPut which are
> > > implemented with selectors at: basicAt: at:put: and basicAt:put: on
> > > Float.  These primitives map index 1 onto the most significant word
> > > and index 2 onto the least significant word.  (requires no image
> > > rewrite, but does require a file-in of the four implementations)
> >
> > I'd like to see Floats stored in native format too.  Don't forget about
> > the 32 bit floats in Float arrays.
>
>
> Tell me more :)  Are these in some funky order, or are they just IEEE single
> precision in platform order?
The attached world.png is a screen shot of a 64-bit image running on
an Intel box, with hex printouts of the contents of an IntegerArray and
a FloatArray (note, OopPlugin is a utility that I use for accessing the
internals of object memory slots in the real object memory). This shows
the internal storage of float values in a FloatArray. I poked various
values into the array so you can see where they are stored in the 64-bit
object memory words.

The values in a FloatArray are 32-bit floats, packed into 64-bit slots
in the object memory. There are no endian issues to worry about. On both
32-bit and 64-bit object memories, the values are arranged in the order
of an (int *) access. In other words, they are arrays of 32-bit values
that just happen to be stuffed onto slots that the object memory thinks
are 64-bit words.

Of course, storage of 32-bit floats in FloatArray is unrelated to the
original topic of Float swizzling.

> (BTW, is reverseWordsFrom:to: broken for 64-bit images?)

As far as I know, there are no problems with this. The original 64-bit
image was done on a big-endian box, and decendants of that image are
running on my little-endian box today, so #reverseWordsFrom:to: must
have worked.

Dave


world.png (70K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: float word order

johnmci
 

On 19-Apr-09, at 8:37 AM, David T. Lewis wrote:

>> The values in a FloatArray are 32-bit floats, packed into 64-bit  
>> slots
> in the object memory. There are no endian issues to worry about. On  
> both
> 32-bit and 64-bit object memories, the values are arranged in the  
> order
> of an (int *) access. In other words, they are arrays of 32-bit values
> that just happen to be stuffed onto slots that the object memory  
> thinks
> are 64-bit words.


Well that's not quite true, you have to be careful here because might  
people move data in and out of
the FloatArray, but let's see..


MatrixTransform2x3>>at: index put: value
        <primitive: 'primitiveAtPut' module: 'FloatArrayPlugin'>
        value isFloat
                ifTrue:[self basicAt: index put: value asIEEE32BitWord]
                ifFalse:[self at: index put: value asFloat].
        ^value


CGPoint>>x: aValue
        self unsignedLongAt: 1 put:  aValue asFloat asIEEE32BitWord  
bigEndian: SmalltalkImage current  isBigEndian.


Ok, well the reverseBytesInImage logic I'll assume without looking is  
swapping the bytes in the FloatArray at load time so that accessors
use SmalltalkImage current  isBigEndian to move data in/out in the  
proper form.
--
=
=
=
========================================================================
John M. McIntosh <[hidden email]>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
=
=
=
========================================================================



Reply | Threaded
Open this post in threaded view
|

Re: float word order

David T. Lewis
 
On Sun, Apr 19, 2009 at 11:08:14AM -0700, John M McIntosh wrote:

>
> On 19-Apr-09, at 8:37 AM, David T. Lewis wrote:
>
> >>The values in a FloatArray are 32-bit floats, packed into 64-bit slots
> >in the object memory. There are no endian issues to worry about. On both
> >32-bit and 64-bit object memories, the values are arranged in the order
> >of an (int *) access. In other words, they are arrays of 32-bit values
> >that just happen to be stuffed onto slots that the object memory thinks
> >are 64-bit words.
>
> Well that's not quite true, you have to be careful here because might  
> people move data in and out of
> the FloatArray, but let's see..
>
>
> MatrixTransform2x3>>at: index put: value
> <primitive: 'primitiveAtPut' module: 'FloatArrayPlugin'>
> value isFloat
> ifTrue:[self basicAt: index put: value asIEEE32BitWord]
> ifFalse:[self at: index put: value asFloat].
> ^value
>
>
> CGPoint>>x: aValue
> self unsignedLongAt: 1 put:  aValue asFloat asIEEE32BitWord  
> bigEndian: SmalltalkImage current  isBigEndian.

As near as I can tell all accesses to FloatArray and IntegerArray are
on 32 bit boundaries for both 32-bit and 64-bit images, and are not
impacted by host endianness.

I should mention that I have not tried FloatArrayPlugin on 64-bit
images; I should probably have a look at that one of these days.

> Ok, well the reverseBytesInImage logic I'll assume without looking is  
> swapping the bytes in the FloatArray at load time so that accessors
> use SmalltalkImage current isBigEndian to move data in/out in the  
> proper form.

Yes, the bytes in a FloatArray would be swapped at load time if moving
from one endianness to another, but no I don't think that #isBigEndian
is required for accessing the ints or floats on 32 bit boundaries.

Also, a 64-bit image containing FloatArray or IntegerArray instances
should be correctly byte swapped when moved from one endianness to
another, although I have never actually tried it so I can't say for
sure.

Bottom line: This stuff pretty much just works, no special cases to
worry about.

Dave