Spur with Immediate Floating Point Support implies a break

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2
Hi All,

    some of you have been brave enough to use Spur and may have got used to being able to update.  Recently I've updated Spur with support for immediate floating-point in 64-bit Spur.  Alas these changes are not amenable to a straight-forward Monticello update.

Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image.  There /may/ be a chance of being able to update if you first file-in MorphFloat.st (find attached).  It worked for me.  So in a recent SPur image, file-in MorphFloat.st and then update.  If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If this doesn't work apologies.

What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2


On Mon, Dec 1, 2014 at 9:58 AM, Eliot Miranda <[hidden email]> wrote:
Hi All,

    some of you have been brave enough to use Spur and may have got used to being able to update.  Recently I've updated Spur with support for immediate floating-point in 64-bit Spur.  Alas these changes are not amenable to a straight-forward Monticello update.

Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image.  There /may/ be a chance of being able to update if you first file-in MorphFloat.st (find attached).  It worked for me.  So in a recent SPur image, file-in MorphFloat.st and then update.  If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If this doesn't work apologies.

What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
--
best,
Eliot



--
best,
Eliot



MorphFloat.st (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Spur with Immediate Floating Point Support implies a break

Levente Uzonyi-2
In reply to this post by Eliot Miranda-2
Hi Eliot,

It's a bit off-topic, but shouldn't there be a primitive that can convert
a float from the boxed representation to immediate? Something like
primNormalizePositive for LargePositiveIntegers. I know it's possible
(or at least it should be, see below) to do it with an operation which has
no effect, but a dedicated primitive looks more natural to me.

Another thing is that it seems like the VM doesn't want to create
SmallFloat64 instances at all:

1.0 class "==> BoxedFloat64"

Maybe it's just the compiler not "normalizing":

(1.0 + 0.0) class "==> BoxedFloat64"
1.0 sin class "==> BoxedFloat64"

No, the plugin doesn't "normalize" either.

Levente

On Mon, 1 Dec 2014, Eliot Miranda wrote:

> Hi All,
>     some of you have been brave enough to use Spur and may have got used to being able to update.  Recently I've updated Spur with support for
> immediate floating-point in 64-bit Spur.  Alas these changes are not amenable to a straight-forward Monticello update.
>
> Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image.  There /may/ be a chance of being able to
> update if you first file-in MorphFloat.st (find attached).  It worked for me.  So in a recent SPur image, file-in MorphFloat.st and then update. 
> If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If
> this doesn't work apologies.
>
> What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
> --
> best,Eliot
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2
Hi Levente,

 
On Mon, Dec 1, 2014 at 7:52 PM, Levente Uzonyi <[hidden email]> wrote:
Hi Eliot,

It's a bit off-topic, but shouldn't there be a primitive that can convert a float from the boxed representation to immediate? Something like primNormalizePositive for LargePositiveIntegers. I know it's possible (or at least it should be, see below) to do it with an operation which has no effect, but a dedicated primitive looks more natural to me.

   I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick.  Why wouldn't this be adequate?

Another thing is that it seems like the VM doesn't want to create SmallFloat64 instances at all:

1.0 class "==> BoxedFloat64"

Maybe it's just the compiler not "normalizing":

(1.0 + 0.0) class "==> BoxedFloat64"
1.0 sin class "==> BoxedFloat64"

No, the plugin doesn't "normalize" either.

Ah, I see.  Hang on.  There is no support for SmallFloat64 on 32-bit Spur.  Only in a 64-bit image/on a 64-bit Spur VM will you be able to create instances of SmallFloat64.  And so far I only have this working in the VM simulator.  I've yet to try and create a real VM, and even then it will only be a Stack VM.

Levente


On Mon, 1 Dec 2014, Eliot Miranda wrote:

Hi All,
    some of you have been brave enough to use Spur and may have got used to being able to update.  Recently I've updated Spur with support for
immediate floating-point in 64-bit Spur.  Alas these changes are not amenable to a straight-forward Monticello update.

Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image.  There /may/ be a chance of being able to
update if you first file-in MorphFloat.st (find attached).  It worked for me.  So in a recent SPur image, file-in MorphFloat.st and then update. 
If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If
this doesn't work apologies.

What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
--
best,Eliot







--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Levente Uzonyi-2
On Mon, 1 Dec 2014, Eliot Miranda wrote:

> I was assuming that any of add or subtract positive or negative zero, or
> multiply or divide by 1.0 would do the trick.  Why wouldn't this be
> adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of
javascript's typecast hacks.

> Ah, I see.  Hang on.  There is no support for SmallFloat64 on 32-bit Spur.
> Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
> instances of SmallFloat64.  And so far I only have this working in the VM
> simulator.  I've yet to try and create a real VM, and even then it will
> only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object
headers the same in both VMs? Or is it because of the difference in
alignment?

Levente

Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2
Hi Levente,

On Wed, Dec 3, 2014 at 3:08 PM, Levente Uzonyi <[hidden email]> wrote:
On Mon, 1 Dec 2014, Eliot Miranda wrote:

I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick.  Why wouldn't this be
adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of javascript's typecast hacks.

Ah, I see.  Hang on.  There is no support for SmallFloat64 on 32-bit Spur.
Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
instances of SmallFloat64.  And so far I only have this working in the VM
simulator.  I've yet to try and create a real VM, and even then it will
only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object headers the same in both VMs? Or is it because of the difference in alignment?

SmallFloat64 is an immediate tagged representation, like SmallInteger, so they fit within an object pointer and have no header.  In 64-bit Spur there is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit exponent to donate to the tags, representing a full double precision floating-point value that is restricted to the ~ +/-10^+/-38 range.  There's really no practical way to shoe-horn a usable range of 64-bit float into a 30-bit value.  Its possible but so few values would fit that the effort would be counter-productive.  DOes this make sense now?




Levente






--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Levente Uzonyi-2
Hi Eliot,

On Wed, 3 Dec 2014, Eliot Miranda wrote:

> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
> they fit within an object pointer and have no header.  In 64-bit Spur there
> is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit
> exponent to donate to the tags, representing a full double precision
> floating-point value that is restricted to the ~ +/-10^+/-38 range.
> There's really no practical way to shoe-horn a usable range of 64-bit float
> into a 30-bit value.  Its possible but so few values would fit that the
> effort would be counter-productive.  DOes this make sense now?

I didn't mean to use 30-bit values. I meant to use the same 61-bit
representation as with the 64-bit Spur.
The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
If yes, then why is it not possible to detect the tag of SmallFloat64 in a
32-bit VM, and treat the object as immediate?

About the "normalizer" primitive, I think it would be better than using
an arithmetic operation, because - if i'm not mistaken - it's possible to
convert the object in-place instead of creating a new one.

Levente

Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Bert Freudenberg

> On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:
>
> Hi Eliot,
>
> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>
>> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
>> they fit within an object pointer and have no header.  In 64-bit Spur there
>> is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit
>> exponent to donate to the tags, representing a full double precision
>> floating-point value that is restricted to the ~ +/-10^+/-38 range.
>> There's really no practical way to shoe-horn a usable range of 64-bit float
>> into a 30-bit value.  Its possible but so few values would fit that the
>> effort would be counter-productive.  DOes this make sense now?
>
> I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
> If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?
Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.

I explained this previously, but I'll paste again:

> The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.
>
> When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004.
>
> If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.
>
> But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.
>
> That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.
>
> And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.
>
> This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.
>
> We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.
>
> Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.
>
> But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.
>
> So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).
>
- Bert -






smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Nicolas Cellier
In reply to this post by Levente Uzonyi-2


2014-12-04 0:08 GMT+01:00 Levente Uzonyi <[hidden email]>:
On Mon, 1 Dec 2014, Eliot Miranda wrote:

I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick.  Why wouldn't this be
adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of javascript's typecast hacks.


No, it's not, it would transform a negative zero into a positive one...
Maybe x - 0.0
 
Ah, I see.  Hang on.  There is no support for SmallFloat64 on 32-bit Spur.
Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
instances of SmallFloat64.  And so far I only have this working in the VM
simulator.  I've yet to try and create a real VM, and even then it will
only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object headers the same in both VMs? Or is it because of the difference in alignment?

Levente






Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2
Hi Nicolas,

On Dec 4, 2014, at 1:51 AM, Nicolas Cellier <[hidden email]> wrote:



2014-12-04 0:08 GMT+01:00 Levente Uzonyi <[hidden email]>:
On Mon, 1 Dec 2014, Eliot Miranda wrote:

I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick.  Why wouldn't this be
adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of javascript's typecast hacks.


No, it's not, it would transform a negative zero into a positive one...
Maybe x - 0.0

What does x * 1.0 do to negative zero?

 
Ah, I see.  Hang on.  There is no support for SmallFloat64 on 32-bit Spur.
Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
instances of SmallFloat64.  And so far I only have this working in the VM
simulator.  I've yet to try and create a real VM, and even then it will
only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object headers the same in both VMs? Or is it because of the difference in alignment?

Levente







Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Levente Uzonyi-2
In reply to this post by Bert Freudenberg
Thanks Bert and Dave. I feel kinda stupid for mixing pointers with
headers. Maybe I shouldn't write mails so late in the evening...

Levente

On Thu, 4 Dec 2014, Bert Freudenberg wrote:

>
>> On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Eliot,
>>
>> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>>
>>> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
>>> they fit within an object pointer and have no header.  In 64-bit Spur there
>>> is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit
>>> exponent to donate to the tags, representing a full double precision
>>> floating-point value that is restricted to the ~ +/-10^+/-38 range.
>>> There's really no practical way to shoe-horn a usable range of 64-bit float
>>> into a 30-bit value.  Its possible but so few values would fit that the
>>> effort would be counter-productive.  DOes this make sense now?
>>
>> I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
>> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
>> If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?
>
> Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.
>
> I explained this previously, but I'll paste again:
>
>> The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.
>>
>> When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004.
>>
>> If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.
>>
>> But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.
>>
>> That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.
>>
>> And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.
>>
>> This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.
>>
>> We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.
>>
>> Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.
>>
>> But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.
>>
>> So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).
>>
> - Bert -
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Nicolas Cellier
In reply to this post by Eliot Miranda-2


2014-12-04 19:22 GMT+01:00 Eliot Miranda <[hidden email]>:
Hi Nicolas,

On Dec 4, 2014, at 1:51 AM, Nicolas Cellier <[hidden email]> wrote:



2014-12-04 0:08 GMT+01:00 Levente Uzonyi <[hidden email]>:
On Mon, 1 Dec 2014, Eliot Miranda wrote:

I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick.  Why wouldn't this be
adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of javascript's typecast hacks.


No, it's not, it would transform a negative zero into a positive one...
Maybe x - 0.0

What does x * 1.0 do to negative zero?


x * 1.0 is perfectly OK

You can test -0.0 * 1.0 in a classic COG -> -0.0 and -0.0 + 0.0 -> 0.0

 
Ah, I see.  Hang on.  There is no support for SmallFloat64 on 32-bit Spur.
Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
instances of SmallFloat64.  And so far I only have this working in the VM
simulator.  I've yet to try and create a real VM, and even then it will
only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object headers the same in both VMs? Or is it because of the difference in alignment?

Levente











Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Ben Coman
In reply to this post by Bert Freudenberg
Bert Freudenberg wrote:

>> On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Eliot,
>>
>> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>>
>>> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
>>> they fit within an object pointer and have no header.  In 64-bit Spur there
>>> is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit
>>> exponent to donate to the tags, representing a full double precision
>>> floating-point value that is restricted to the ~ +/-10^+/-38 range.
>>> There's really no practical way to shoe-horn a usable range of 64-bit float
>>> into a 30-bit value.  Its possible but so few values would fit that the
>>> effort would be counter-productive.  DOes this make sense now?
>> I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
>> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
>> If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?
>
> Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.
>
> I explained this previously, but I'll paste again:
>
>> The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.
>>
>> When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004.
>>
>> If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.
>>
>> But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.
>>
>> That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.
>>
>> And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.
>>
>> This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.
>>
>> We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.
>>
>> Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.
>>
>> But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.
>>
>> So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).

btw, I forgot to say so when you last wrote that, it was enlightening -
thanks for taking the time to write it.
cheers -ben

Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2


On Thu, Dec 4, 2014 at 2:46 PM, Ben Coman <[hidden email]> wrote:
Bert Freudenberg wrote:
On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:

Hi Eliot,

On Wed, 3 Dec 2014, Eliot Miranda wrote:

SmallFloat64 is an immediate tagged representation, like SmallInteger, so
they fit within an object pointer and have no header.  In 64-bit Spur there
is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit
exponent to donate to the tags, representing a full double precision
floating-point value that is restricted to the ~ +/-10^+/-38 range.
There's really no practical way to shoe-horn a usable range of 64-bit float
into a 30-bit value.  Its possible but so few values would fit that the
effort would be counter-productive.  DOes this make sense now?
I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?

Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.

I explained this previously, but I'll paste again:

The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.

When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004.
If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.

But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.

That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.

And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.

This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.

We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.

Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.

But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.

So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).

btw, I forgot to say so when you last wrote that, it was enlightening - thanks for taking the time to write it.

If the information was in a class comment somewhere would you have found it and read it?  

 
cheers -ben




--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [squeak-dev] Spur with Immediate Floating Point Support implies a break

Chris Cunnington-4

On Dec 4, 2014, at 6:00 PM, Eliot Miranda <[hidden email]> wrote:



On Thu, Dec 4, 2014 at 2:46 PM, Ben Coman <[hidden email]> wrote:
Bert Freudenberg wrote:
On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:

Hi Eliot,

On Wed, 3 Dec 2014, Eliot Miranda wrote:

SmallFloat64 is an immediate tagged representation, like SmallInteger, so
they fit within an object pointer and have no header.  In 64-bit Spur there
is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the 11-bit
exponent to donate to the tags, representing a full double precision
floating-point value that is restricted to the ~ +/-10^+/-38 range.
There's really no practical way to shoe-horn a usable range of 64-bit float
into a 30-bit value.  Its possible but so few values would fit that the
effort would be counter-productive.  DOes this make sense now?
I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?

Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.

I explained this previously, but I'll paste again:

The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.

When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004. 
If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.

But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.

That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.

And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.

This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.

We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.

Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.

But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.

So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).

btw, I forgot to say so when you last wrote that, it was enlightening - thanks for taking the time to write it.

If the information was in a class comment somewhere would you have found it and read it?  

Yes, but I don't know how much help it would have been. Often class comments are slivers of a bigger picture. I've read the class comment for ObjectMemory. [1] I've read Igor's pdf from a Lille in 2011. 


And it's still pretty confusing. The fact that it can confuse Levente Uzonyi is both liberating and a salutary lesson. And as I've opened my big mouth, I may as well put my head on the chopping block. 

- the heap readdresses 32-bit locations, so they are not the same as the addresses in memory, the numbers on the metal 
- a 32-bit address can be an immediate object, because the address can be a 30-bit number. (i.e. @00000007) 
- a 32-bit address can be a reference to 96-bit object somewhere. The 32-bit address leads to a header that describes the object pursuant of the ObjectMemory comment [1]

If these things are true, then doesn't that mean every time a number is used, then that 32-bit space, which could have been an address, is now invalid as an address to an object? If that's right, then it's a tad odd, right? The more math you do then the more @00000007 and @00000008 numbers are consumed leaving fewer, of the possible 4.3 billion 32-bit words to address objects with headers. And if that's true, then some intelligence in the VM somewhere is saying: "No, that address is working as an 'immediate object' for math. You'll have to use another 32-bit address to lead you to the 96-bits that come complete with a header". 

Sometimes the number on the post office box is the datum. Other times, you need to look inside the post office box for the datum. 

Chris 


[1] 
This class describes a 32-bit direct-pointer object memory for Smalltalk.  The model is very simple in principle:  a pointer is either a SmallInteger or a 32-bit direct object pointer.

SmallIntegers are tagged with a low-order bit equal to 1, and an immediate 31-bit 2s-complement signed value in the rest of the word.

All object pointers point to a header, which may be followed by a number of data fields.  This object memory achieves considerable compactness by using a variable header size (the one complexity of the design).  The format of the 0th header word is as follows:

3 bits reserved for gc (mark, root, unused)
12 bits object hash (for HashSets)
5 bits compact class index
4 bits object format
6 bits object size in 32-bit words
2 bits header type (0: 3-word, 1: 2-word, 2: forbidden, 3: 1-word)

If a class is in the compact class table, then this is the only header information needed.  If it is not, then it will have another header word at offset -4 bytes with its class in the high 30 bits, and the header type repeated in its low 2 bits.  It the objects size is greater than 255 bytes, then it will have yet another header word at offset -8 bytes with its full word size in the high 30 bits and its header type repeated in the low two bits.

The object format field provides the remaining information as given in the formatOf: method (including isPointers, isVariable, isBytes, and the low 2 size bits of byte-sized objects).

This implementation includes incremental (2-generation) and full garbage collection, each with compaction and rectification of direct pointers.  It also supports a bulk-become (exchange object identity) feature that allows many objects to be becomed at once, as when all instances of a class must be grown or shrunk.

There is now a simple 64-bit version of the object memory.  It is the simplest possible change that could work.  It merely sign-extends all integer oops, and extends all object headers and oops by adding 32 zeroes in the high bits.  The format of the base header word is changed in one minor, not especially elegant, way.  Consider the old 32-bit header:
ggghhhhhhhhhhhhcccccffffsssssstt
The 64-bit header is almost identical, except that the size field (now being in units of 8 bytes, has a zero in its low-order bit.  At the same time, the byte-size residue bits for byte objects, which are in the low order bits of formats 8-11 and 12-15, are now in need of another bit of residue.  So, the change is as follows:
ggghhhhhhhhhhhhcccccffffsssssrtt
where bit r supplies the 4's bit of the byte size residue for byte objects.  Oh, yes, this is also needed now for 'variableWord' objects, since their size in 32-bit words requires a low-order bit.

See the comment in formatOf: for the change allowing for 64-bit wide bitmaps, now dubbed 'variableLong'.

 
cheers -ben




-- 
best,
Eliot



Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [squeak-dev] Spur with Immediate Floating Point Support implies a break

Bert Freudenberg
On 05.12.2014, at 00:50, Chris Cunnington <[hidden email]> wrote:

- the heap readdresses 32-bit locations, so they are not the same as the addresses in memory, the numbers on the metal 
- a 32-bit address can be an immediate object, because the address can be a 30-bit number. (i.e. @00000007) 
- a 32-bit address can be a reference to 96-bit object somewhere. The 32-bit address leads to a header that describes the object pursuant of the ObjectMemory comment [1]

If these things are true, then doesn't that mean every time a number is used, then that 32-bit space, which could have been an address, is now invalid as an address to an object? If that's right, then it's a tad odd, right? The more math you do then the more @00000007 and @00000008 numbers are consumed leaving fewer, of the possible 4.3 billion 32-bit words to address objects with headers.

That is exactly right, you understood perfectly fine :)

And if that's true, then some intelligence in the VM somewhere is saying: "No, that address is working as an 'immediate object' for math. You'll have to use another 32-bit address to lead you to the 96-bits that come complete with a header”. 

Indeed, only one in four words is a valid address for an object. But this is actually optimal for a 32-bit processor. Its data bus is 32 bits wide. It cannot read a single byte from memory, it always reads 4 bytes, 32 bits. If you wanted to read 32 bits from the address @00000007 it would have to fetch the two 32 bit words from address @00000004 and @00000008, and combine 8 bits from one with 24 bits from the other. I think Intel CPUs actually do that, whereas others just say "nope, I won’t do that, it’s silly”.

This is called “aligned” vs “unaligned” access. Unaligned access is slow, if it works at all. That is why we align all objects on addresses that are a multiple of 4 bytes.

This is not a waste of memory since most objects are a multiple of 4 bytes long anyway. That’s because each reference to an object is 32 bits, so all pointer objects are multiple of 4 bytes long. Same for word objects. Only in byte objects we may waste 1 to 3 bytes. Wasting on average 2 bytes per string is a small price to pay for a huge gain in speed for the whole VM.

- Bert -




smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Immediate and heap objects

Bert Freudenberg
In reply to this post by Bert Freudenberg
I just thought of a unified explanation for immediate and non-immediate objects. It somewhat inverts the notion of "normal", but maybe this way it is easier to understand?

--------------------------------------------------------------

In Squeak, everything is an "object". Each object has a reference to another object defining its behavior. This is called the object's "class". Many objects can reference the same class object, they are called the class's "instances". In addition to the class reference, an object may hold other data, the so-called "instance data". The interpretation of this data is defined by the class.

Each object is stored in main memory using at least 1 machine word. Different variants of Squeak use either 32 or 64 bit words. For efficiency reasons, the storage format for an object is akin to a "Huffman code", using fewer bits and words for more common kinds of objects.

How exactly the object's bits encode the class and instance data is not visible to the user. The Virtual Machine transparently handles the details and makes all objects appear alike.

Some objects encode both the class reference and instance data in 1 word. These are called "immediate objects".

Most objects do not fit in 1 word. These have a second part dynamically allocated on the heap. They are called "heap objects".

The 1-word first part (the only word in immediates) is called an "oop". It is used to reference an object from another object's instance data.

The oop has some "tag bits" and some data bits. The tag bits encode the class, and the data bits encode the instance data. One special combination of tag bits is reserved to denote heap objects. The other combinations of tag bits correspond to different classes of immediate objects.

32-bit oops have 2 tag bits. This allows four combinations of tag bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate "SmallInteger" instances, which represents signed numbers between -1073741824 and 1073741823. The tag 10 will be used in Spur for immediate Characters.

64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are assigned at the moment, for SmallIntegers, Characters, and SmallFloat64s.

If all tag bits in an oop are zero, this denotes a heap object. In this case, the oop does not immediately encode the class and instance data, but instead it identifies a chunk of memory where that information is stored. Such an untagged oop is used as a direct pointer into the heap.

The memory layout of heap objects is specified by the object's class. If you're interested in that layout or the actual assignment of tag bits, read Clement's excellent post:
        https://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/

--------------------------------------------------------------

Of course we normally call heap objects "regular objects", and as users we rarely have to care about the distinction anyway. But maybe when we do, explaining it the other way around is actually helpful ...

- Bert -

PS: Another idea would be to distinguish between "register objects" and "memory objects" and explaining it in terms of CPU operations, like I did in my previous attempt. Actually, that may not be such a bad idea?




smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [squeak-dev] Spur with Immediate Floating Point Support implies a break

David T. Lewis
In reply to this post by Chris Cunnington-4
On Thu, Dec 04, 2014 at 06:50:05PM -0500, Chris Cunnington wrote:
>
> And it's still pretty confusing. The fact that it can confuse Levente Uzonyi
>is both liberating and a salutary lesson. And as I've opened my big mouth,
> I may as well put my head on the chopping block.
>

I find it very confusing also. After all, we are dealing with address space
in memory, virtual address space in the OS process that maps to that memory,
a big array of object memory that got allocated somewhere within that process
virtual address space, a system of byte-addressable object pointers that refer
to locations within the object memory, and a bit of tricky object pointer
decoding that involves checking each pointer value to see if it points at a
4 or 8 byte boundary, and if not, then decode the pointer as an immediate value.
And of course, the thing that that object pointer points to is an object header
word, which might be either 32-bits or 64-bits, and which might be one of
up to three header words, unless is it Spur, in which case it always has one
header word.

You are entitled to be confused. If you were not confused the first few
times you looked at this, you probably were not paying attention.


> - the heap readdresses 32-bit locations, so they are not the same as the addresses in memory, the numbers on the metal
> - a 32-bit address can be an immediate object, because the address can be a 30-bit number. (i.e. @00000007)
> - a 32-bit address can be a reference to 96-bit object somewhere. The 32-bit address leads to a header that describes the object pursuant of the ObjectMemory comment [1]
>
> If these things are true, then doesn't that mean every time a number is used, then that 32-bit space, which could have been an address, is now invalid as an address to an object? If that's right, then it's a tad odd, right? The more math you do then the more @00000007 and @00000008 numbers are consumed leaving fewer, of the possible 4.3 billion 32-bit words to address objects with headers. And if that's true, then some intelligence in the VM somewhere is saying: "No, that address is working as an 'immediate object' for math. You'll have to use another 32-bit address to lead you to the 96-bits that come complete with a header".
>

You are exactly right. The address space is "wasteful". The object pointers are
pointers to byte locations within the object memory space. But the positions in
the object memory are either 32-bit words (in the current object memory) or 64-bit
words (in the Squeak "format 68002" object memory or in 64 bit Spur).

So it is indeed wasteful to use byte-addressable pointers to point to 32-bit or
64-bit locations in the object memory. But there is a hidden advantage to this
waste of address space. Any object pointer that points to a byte that is not
on a 32-bit (or 64-bit) boundary is pointing at something that cannot
possibly be an object header. In a 32-bit object memory, only one of every
four addresses can be a valid location in the object memory. All the rest of
those "wasted" addresses can be used for something else.

So what should the wasted addresses be used for? Immediate objects.

Dave


Reply | Threaded
Open this post in threaded view
|

Re: [Vm-dev] [squeak-dev] Spur with Immediate Floating Point Support implies a break

Eliot Miranda-2
Hi David,

On Thu, Dec 4, 2014 at 5:12 PM, David T. Lewis <[hidden email]> wrote:
On Thu, Dec 04, 2014 at 06:50:05PM -0500, Chris Cunnington wrote:
>
> And it's still pretty confusing. The fact that it can confuse Levente Uzonyi
>is both liberating and a salutary lesson. And as I've opened my big mouth,
> I may as well put my head on the chopping block.
>

I find it very confusing also. After all, we are dealing with address space
in memory, virtual address space in the OS process that maps to that memory,
a big array of object memory that got allocated somewhere within that process
virtual address space, a system of byte-addressable object pointers that refer
to locations within the object memory, and a bit of tricky object pointer
decoding that involves checking each pointer value to see if it points at a
4 or 8 byte boundary, and if not, then decode the pointer as an immediate value.
And of course, the thing that that object pointer points to is an object header
word, which might be either 32-bits or 64-bits, and which might be one of
up to three header words, unless is it Spur, in which case it always has one
header word.

Forgive the correction but in Spur an object has a 128-bit/16 byte header if it has more than 254 slots.  There's only room for an 8-bit slot count in the 64-bit header, so when an object's size overflows it overflows into a full 8-byte slot size field which precedes the normal header.  This turns out to be a good choice because a) using a slot count gives 4 or 8 times the range than using a byte size (as the V3 object representation does), and b) the overhead of the extra 8 byte header is always less than 0.8% in 32 bits and less than 0.4% in 64-bits.  See slides 9, 29 from my talk at ESUG: http://www.slidesearch.org/slide/spur-a-new-object-representation-for-cog.

Also note that in Spur the object memory need not be contiguous.  See slide 32.  It grows by adding dis-contiguous segments, the only requirement being that these are at a higher address than the first segment allocated, a straight-forward constraint to observe in practice.

You are entitled to be confused. If you were not confused the first few
times you looked at this, you probably were not paying attention.


> - the heap readdresses 32-bit locations, so they are not the same as the addresses in memory, the numbers on the metal
> - a 32-bit address can be an immediate object, because the address can be a 30-bit number. (i.e. @00000007)
> - a 32-bit address can be a reference to 96-bit object somewhere. The 32-bit address leads to a header that describes the object pursuant of the ObjectMemory comment [1]
>
> If these things are true, then doesn't that mean every time a number is used, then that 32-bit space, which could have been an address, is now invalid as an address to an object? If that's right, then it's a tad odd, right? The more math you do then the more @00000007 and @00000008 numbers are consumed leaving fewer, of the possible 4.3 billion 32-bit words to address objects with headers. And if that's true, then some intelligence in the VM somewhere is saying: "No, that address is working as an 'immediate object' for math. You'll have to use another 32-bit address to lead you to the 96-bits that come complete with a header".
>

You are exactly right. The address space is "wasteful". The object pointers are
pointers to byte locations within the object memory space. But the positions in
the object memory are either 32-bit words (in the current object memory) or 64-bit
words (in the Squeak "format 68002" object memory or in 64 bit Spur).

So it is indeed wasteful to use byte-addressable pointers to point to 32-bit or
64-bit locations in the object memory. But there is a hidden advantage to this
waste of address space. Any object pointer that points to a byte that is not
on a 32-bit (or 64-bit) boundary is pointing at something that cannot
possibly be an object header. In a 32-bit object memory, only one of every
four addresses can be a valid location in the object memory. All the rest of
those "wasted" addresses can be used for something else.

So what should the wasted addresses be used for? Immediate objects.

Well, Chris may be pointing out that there are no pointers into the middle of objects, and that this pointer space is somehow "wasted".  But given that the garbage collector moves objects around, any wasted 4-byte or 8-byte aligned addresses not being used to point to objects at one time may be used at another.  In practice the OS's layout of memory (e.g. only 50% of the 32-bit address space is available by default on Windows XP or 75% of the address space on linux) is more of a limitation.

With 64-bits things change.  x86-64 implementations currently only sup[port 56-bit physical addresses.  That's a lot of space address space.  WIth 64-bit processors one can say that address space is cheap.



Dave





--
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Immediate and heap objects

Eliot Miranda-2
In reply to this post by Bert Freudenberg
Hi Bert,

On Thu, Dec 4, 2014 at 5:05 PM, Bert Freudenberg <[hidden email]> wrote:
I just thought of a unified explanation for immediate and non-immediate objects. It somewhat inverts the notion of "normal", but maybe this way it is easier to understand?

Not bad.  Can you repost with corrections?  See below:

 

--------------------------------------------------------------

In Squeak, everything is an "object". Each object has a reference to another object defining its behavior. This is called the object's "class". Many objects can reference the same class object, they are called the class's "instances". In addition to the class reference, an object may hold other data, the so-called "instance data". The interpretation of this data is defined by the class.

Each object is stored in main memory using at least 1 machine word. Different variants of Squeak use either 32 or 64 bit words. For efficiency reasons, the storage format for an object is akin to a "Huffman code", using fewer bits and words for more common kinds of objects.

How exactly the object's bits encode the class and instance data is not visible to the user. The Virtual Machine transparently handles the details and makes all objects appear alike.

Some objects encode both the class reference and instance data in 1 word. These are called "immediate objects".

Most objects do not fit in 1 word. These have a second part dynamically allocated on the heap. They are called "heap objects".

The 1-word first part (the only word in immediates) is called an "oop". It is used to reference an object from another object's instance data.

The oop has some "tag bits" and some data bits.
 
"The tag bits encode the class, and the data bits encode the instance data."

Incorrect.  So perhaps:

"If the object fits in one word and it has a suitable class then the tag bits define the class and the data bits define the instance data.  Since there are very few tag bits, the VM only uses this tagged immediate representation for common objects like integers and characters.

If the object doesn't fit in one word the class is stored on the heap in the object's body along with its data.  This is so called a heap object."

 
One special combination of tag bits is reserved to denote heap objects. The other combinations of tag bits correspond to different classes of immediate objects.

32-bit oops have 2 tag bits. This allows four combinations of tag bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate "SmallInteger" instances, which represents signed numbers between -1073741824 and 1073741823. The tag 10 will be used in Spur for immediate Characters.

Can we say the tag 10 *is* used for Characters in Spur?  (it is).
 

64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are assigned at the moment, for SmallIntegers, Characters, and SmallFloat64s.

SmallIntegers have the tag 2r001, Characters have the tag 2r010 and SmallFloat64s have the tag 2r011, leaving four unused tag values.


If all tag bits in an oop are zero, this denotes a heap object. In this case, the oop does not immediately encode the class and instance data, but instead it identifies a chunk of memory where that information is stored. Such an untagged oop is used as a direct pointer into the heap.

The memory layout of heap objects is specified by the object's class. If you're interested in that layout or the actual assignment of tag bits, read Clement's excellent post:
        https://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/

--------------------------------------------------------------

Of course we normally call heap objects "regular objects", and as users we rarely have to care about the distinction anyway. But maybe when we do, explaining it the other way around is actually helpful ...

- Bert -

PS: Another idea would be to distinguish between "register objects" and "memory objects" and explaining it in terms of CPU operations, like I did in my previous attempt. Actually, that may not be such a bad idea?



--
best,
Eliot


12