Smalltalk › Squeak › Squeak VM

Spur with Immediate Floating Point Support implies a break

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

19 messages Options

Eliot Miranda-2

Spur with Immediate Floating Point Support implies a break

Hi All,

some of you have been brave enough to use Spur and may have got used to being able to update. Recently I've updated Spur with support for immediate floating-point in 64-bit Spur. Alas these changes are not amenable to a straight-forward Monticello update.

Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image. There /may/ be a chance of being able to update if you first file-in MorphFloat.st (find attached). It worked for me. So in a recent SPur image, file-in MorphFloat.st and then update. If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If this doesn't work apologies.

What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.

best,

Eliot

Eliot Miranda-2

Re: Spur with Immediate Floating Point Support implies a break

On Mon, Dec 1, 2014 at 9:58 AM, Eliot Miranda <[hidden email]> wrote:

Hi All,

some of you have been brave enough to use Spur and may have got used to being able to update. Recently I've updated Spur with support for immediate floating-point in 64-bit Spur. Alas these changes are not amenable to a straight-forward Monticello update.

Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image. There /may/ be a chance of being able to update if you first file-in MorphFloat.st (find attached). It worked for me. So in a recent SPur image, file-in MorphFloat.st and then update. If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If this doesn't work apologies.

What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
--
best,
Eliot

best,

Eliot

MorphFloat.st (6K) Download Attachment

Levente Uzonyi-2

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

In reply to this post by Eliot Miranda-2

Hi Eliot,

It's a bit off-topic, but shouldn't there be a primitive that can convert
a float from the boxed representation to immediate? Something like
primNormalizePositive for LargePositiveIntegers. I know it's possible
(or at least it should be, see below) to do it with an operation which has
no effect, but a dedicated primitive looks more natural to me.

Another thing is that it seems like the VM doesn't want to create
SmallFloat64 instances at all:

1.0 class "==> BoxedFloat64"

Maybe it's just the compiler not "normalizing":

(1.0 + 0.0) class "==> BoxedFloat64"
1.0 sin class "==> BoxedFloat64"

No, the plugin doesn't "normalize" either.

Levente

On Mon, 1 Dec 2014, Eliot Miranda wrote:

> Hi All,
> some of you have been brave enough to use Spur and may have got used to being able to update. Recently I've updated Spur with support for
> immediate floating-point in 64-bit Spur. Alas these changes are not amenable to a straight-forward Monticello update.
>
> Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image. There /may/ be a chance of being able to
> update if you first file-in MorphFloat.st (find attached). It worked for me. So in a recent SPur image, file-in MorphFloat.st and then update.
> If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If
> this doesn't work apologies.
>
> What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
> --
> best,Eliot
>
>

Eliot Miranda-2

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Hi Levente,

On Mon, Dec 1, 2014 at 7:52 PM, Levente Uzonyi <[hidden email]> wrote:

Hi Eliot,

It's a bit off-topic, but shouldn't there be a primitive that can convert a float from the boxed representation to immediate? Something like primNormalizePositive for LargePositiveIntegers. I know it's possible (or at least it should be, see below) to do it with an operation which has no effect, but a dedicated primitive looks more natural to me.

I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick. Why wouldn't this be adequate?

Another thing is that it seems like the VM doesn't want to create SmallFloat64 instances at all:

1.0 class "==> BoxedFloat64"

Maybe it's just the compiler not "normalizing":

(1.0 + 0.0) class "==> BoxedFloat64"
1.0 sin class "==> BoxedFloat64"

No, the plugin doesn't "normalize" either.

Ah, I see. Hang on. There is no support for SmallFloat64 on 32-bit Spur. Only in a 64-bit image/on a 64-bit Spur VM will you be able to create instances of SmallFloat64. And so far I only have this working in the VM simulator. I've yet to try and create a real VM, and even then it will only be a Stack VM.

Levente

On Mon, 1 Dec 2014, Eliot Miranda wrote:

Hi All,
some of you have been brave enough to use Spur and may have got used to being able to update. Recently I've updated Spur with support for
immediate floating-point in 64-bit Spur. Alas these changes are not amenable to a straight-forward Monticello update.

Now that I've updated Kernel.spur with these changes you'll not be able to simply update your image. There /may/ be a chance of being able to
update if you first file-in MorphFloat.st (find attached). It worked for me. So in a recent SPur image, file-in MorphFloat.st and then update.
If things get stuck on a partial update of Kernel.spur-eem.867(blah).mcd, then load Kernel.spur-eem.867.mcz manually and then update again. If
this doesn't work apologies.

What you can definitely do is upload the latest Spur image from www.mirandabanda.org/files/Cog/SpurImages/2014-12-01 and rebuild.
--
best,Eliot

best,

Eliot

Levente Uzonyi-2

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

On Mon, 1 Dec 2014, Eliot Miranda wrote:

> I was assuming that any of add or subtract positive or negative zero, or
> multiply or divide by 1.0 would do the trick. Why wouldn't this be
> adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of
javascript's typecast hacks.

> Ah, I see. Hang on. There is no support for SmallFloat64 on 32-bit Spur.
> Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
> instances of SmallFloat64. And so far I only have this working in the VM
> simulator. I've yet to try and create a real VM, and even then it will
> only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object
headers the same in both VMs? Or is it because of the difference in
alignment?

Levente

Eliot Miranda-2

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Hi Levente,

On Wed, Dec 3, 2014 at 3:08 PM, Levente Uzonyi <[hidden email]> wrote:

On Mon, 1 Dec 2014, Eliot Miranda wrote:

I was assuming that any of add or subtract positive or negative zero, or multiply or divide by 1.0 would do the trick. Why wouldn't this be
adequate?

I think "x + 0.0" is adequate, but unnatural. It reminds me of javascript's typecast hacks.

Ah, I see. Hang on. There is no support for SmallFloat64 on 32-bit Spur.
Only in a 64-bit image/on a 64-bit Spur VM will you be able to create
instances of SmallFloat64. And so far I only have this working in the VM
simulator. I've yet to try and create a real VM, and even then it will
only be a Stack VM.

Wouldn't it be possible to support them in a 32-bit VM? Aren't object headers the same in both VMs? Or is it because of the difference in alignment?

SmallFloat64 is an immediate tagged representation, like SmallInteger, so they fit within an object pointer and have no header. In 64-bit Spur there is a 3-bit tag, leaving 61 bits. SmallFoat64 steals 3 bits from the 11-bit exponent to donate to the tags, representing a full double precision floating-point value that is restricted to the ~ +/-10^+/-38 range. There's really no practical way to shoe-horn a usable range of 64-bit float into a 30-bit value. Its possible but so few values would fit that the effort would be counter-productive. DOes this make sense now?

Levente

best,

Eliot

Levente Uzonyi-2

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Hi Eliot,

On Wed, 3 Dec 2014, Eliot Miranda wrote:

> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
> they fit within an object pointer and have no header. In 64-bit Spur there
> is a 3-bit tag, leaving 61 bits. SmallFoat64 steals 3 bits from the 11-bit
> exponent to donate to the tags, representing a full double precision
> floating-point value that is restricted to the ~ +/-10^+/-38 range.
> There's really no practical way to shoe-horn a usable range of 64-bit float
> into a 30-bit value. Its possible but so few values would fit that the
> effort would be counter-productive. DOes this make sense now?

I didn't mean to use 30-bit values. I meant to use the same 61-bit
representation as with the 64-bit Spur.
The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
If yes, then why is it not possible to detect the tag of SmallFloat64 in a
32-bit VM, and treat the object as immediate?

About the "normalizer" primitive, I think it would be better than using
an arithmetic operation, because - if i'm not mistaken - it's possible to
convert the object in-place instead of creating a new one.

Levente

David T. Lewis

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

On Thu, Dec 04, 2014 at 04:18:18AM +0100, Levente Uzonyi wrote:

>
> Hi Eliot,
>
> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>
> >SmallFloat64 is an immediate tagged representation, like SmallInteger, so
> >they fit within an object pointer and have no header. In 64-bit Spur there
> >is a 3-bit tag, leaving 61 bits. SmallFoat64 steals 3 bits from the 11-bit
> >exponent to donate to the tags, representing a full double precision
> >floating-point value that is restricted to the ~ +/-10^+/-38 range.
> >There's really no practical way to shoe-horn a usable range of 64-bit float
> >into a 30-bit value. Its possible but so few values would fit that the
> >effort would be counter-productive. DOes this make sense now?
>
> I didn't mean to use 30-bit values. I meant to use the same 61-bit
> representation as with the 64-bit Spur.
> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
> If yes, then why is it not possible to detect the tag of SmallFloat64 in a
> 32-bit VM, and treat the object as immediate?

Our terminology is rather confusing here, particularly if you are accustomed
to pointers in the C language.

The class comment of ObjectMemory (written by Dan Ingalls) gives a very good
explanation, but you have to realize that the "pointers" are not pointers in
the sense of the C language. They are effectively indexes into a big array of
(32 bit or 64 bit) slots that make up the object memory. An object memory
pointer is an integer value that points to an object header. In Spur, the
object header is 64 bits in size, but the object pointer (or "oop") will be
either a 64 bit integer in the 64 bit Spur object memory, or a 32 bit integer
in the 32 bit Spur object memory.

Immediate values are encoded within the object pointer, not the object header.
The trick is that, if you know in advance that the locations of object headers
within the object memory will be placed on 32 bit or 64 bit boundaries, and
if the object pointers refer to byte (not word) addresses, then you know that
any object pointer with the low order bit set (or low order two bits in the
case of a 64 bit object memory) cannot refer to any valid location in the object
memory. That is why the low order bits are used to "tag" the immediate values.

The immediate values are thus hidden within the address space of object pointers,
not in the actual object headers. For a 32-bit object memory, where the object
pointers are 32 bits (even if the object headers are a different size), there
is no room to pack anything other than the immediate small integers.

In addition to the class comment in ObjectMemory, you can also look at class
MemoryAccess from the MemoryAccess package on the VMMaker repository. This
implements the mapping of object memory pointers to lower level C pointers.
It is written in Smalltalk rather than the traditional C preprocessor macros,
so it provides a few comments that hopefully make it easier to read (and to
execute in either the VM simulator or a C debugger).

Dave

Bert Freudenberg

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

In reply to this post by Levente Uzonyi-2

> On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:
>
> Hi Eliot,
>
> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>
>> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
>> they fit within an object pointer and have no header. In 64-bit Spur there
>> is a 3-bit tag, leaving 61 bits. SmallFoat64 steals 3 bits from the 11-bit
>> exponent to donate to the tags, representing a full double precision
>> floating-point value that is restricted to the ~ +/-10^+/-38 range.
>> There's really no practical way to shoe-horn a usable range of 64-bit float
>> into a 30-bit value. Its possible but so few values would fit that the
>> effort would be counter-productive. DOes this make sense now?
>
> I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
> If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?

Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.

I explained this previously, but I'll paste again:

> The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.
>
> When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004.
>
> If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.
>
> But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.
>
> That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.
>
> And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.
>
> This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.
>
> We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.
>
> Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.
>
> But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.
>
> So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).
>

- Bert -

smime.p7s (5K) Download Attachment

Levente Uzonyi-2

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Thanks Bert and Dave. I feel kinda stupid for mixing pointers with
headers. Maybe I shouldn't write mails so late in the evening...

Levente

On Thu, 4 Dec 2014, Bert Freudenberg wrote:

>
>> On 04.12.2014, at 04:18, Levente Uzonyi <[hidden email]> wrote:
>>
>> Hi Eliot,
>>
>> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>>
>>> SmallFloat64 is an immediate tagged representation, like SmallInteger, so
>>> they fit within an object pointer and have no header. In 64-bit Spur there
>>> is a 3-bit tag, leaving 61 bits. SmallFoat64 steals 3 bits from the 11-bit
>>> exponent to donate to the tags, representing a full double precision
>>> floating-point value that is restricted to the ~ +/-10^+/-38 range.
>>> There's really no practical way to shoe-horn a usable range of 64-bit float
>>> into a 30-bit value. Its possible but so few values would fit that the
>>> effort would be counter-productive. DOes this make sense now?
>>
>> I didn't mean to use 30-bit values. I meant to use the same 61-bit representation as with the 64-bit Spur.
>> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
>> If yes, then why is it not possible to detect the tag of SmallFloat64 in a 32-bit VM, and treat the object as immediate?
>
> Because that is not what "immediate" means. There is no header, and not even an object. The value is encoded in the oop itself. You can't fit 61 bits in a 32 bit oop.
>
> I explained this previously, but I'll paste again:
>
>> The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.
>>
>> When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004.
>>
>> If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.
>>
>> But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.
>>
>> That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.
>>
>> And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.
>>
>> This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.
>>
>> We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.
>>
>> Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.
>>
>> But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.
>>
>> So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).
>>
> - Bert -
>
>
>
>

Bert Freudenberg

Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

In reply to this post by Bert Freudenberg

On 05.12.2014, at 00:50, Chris Cunnington <[hidden email]> wrote:

- the heap readdresses 32-bit locations, so they are not the same as the addresses in memory, the numbers on the metal
- a 32-bit address can be an immediate object, because the address can be a 30-bit number. (i.e. @00000007)
- a 32-bit address can be a reference to 96-bit object somewhere. The 32-bit address leads to a header that describes the object pursuant of the ObjectMemory comment [1]

If these things are true, then doesn't that mean every time a number is used, then that 32-bit space, which could have been an address, is now invalid as an address to an object? If that's right, then it's a tad odd, right? The more math you do then the more @00000007 and @00000008 numbers are consumed leaving fewer, of the possible 4.3 billion 32-bit words to address objects with headers.

That is exactly right, you understood perfectly fine :)

And if that's true, then some intelligence in the VM somewhere is saying: "No, that address is working as an 'immediate object' for math. You'll have to use another 32-bit address to lead you to the 96-bits that come complete with a header”.

Indeed, only one in four words is a valid address for an object. But this is actually optimal for a 32-bit processor. Its data bus is 32 bits wide. It cannot read a single byte from memory, it always reads 4 bytes, 32 bits. If you wanted to read 32 bits from the address @00000007 it would have to fetch the two 32 bit words from address @00000004 and @00000008, and combine 8 bits from one with 24 bits from the other. I think Intel CPUs actually do that, whereas others just say "nope, I won’t do that, it’s silly”.

This is called “aligned” vs “unaligned” access. Unaligned access is slow, if it works at all. That is why we align all objects on addresses that are a multiple of 4 bytes.

This is not a waste of memory since most objects are a multiple of 4 bytes long anyway. That’s because each reference to an object is 32 bits, so all pointer objects are multiple of 4 bytes long. Same for word objects. Only in byte objects we may waste 1 to 3 bytes. Wasting on average 2 bytes per string is a small price to pay for a huge gain in speed for the whole VM.

- Bert -

smime.p7s (5K) Download Attachment

Bert Freudenberg

Immediate and heap objects

In reply to this post by Bert Freudenberg

I just thought of a unified explanation for immediate and non-immediate objects. It somewhat inverts the notion of "normal", but maybe this way it is easier to understand?

--------------------------------------------------------------

In Squeak, everything is an "object". Each object has a reference to another object defining its behavior. This is called the object's "class". Many objects can reference the same class object, they are called the class's "instances". In addition to the class reference, an object may hold other data, the so-called "instance data". The interpretation of this data is defined by the class.

Each object is stored in main memory using at least 1 machine word. Different variants of Squeak use either 32 or 64 bit words. For efficiency reasons, the storage format for an object is akin to a "Huffman code", using fewer bits and words for more common kinds of objects.

How exactly the object's bits encode the class and instance data is not visible to the user. The Virtual Machine transparently handles the details and makes all objects appear alike.

Some objects encode both the class reference and instance data in 1 word. These are called "immediate objects".

Most objects do not fit in 1 word. These have a second part dynamically allocated on the heap. They are called "heap objects".

The 1-word first part (the only word in immediates) is called an "oop". It is used to reference an object from another object's instance data.

The oop has some "tag bits" and some data bits. The tag bits encode the class, and the data bits encode the instance data. One special combination of tag bits is reserved to denote heap objects. The other combinations of tag bits correspond to different classes of immediate objects.

32-bit oops have 2 tag bits. This allows four combinations of tag bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate "SmallInteger" instances, which represents signed numbers between -1073741824 and 1073741823. The tag 10 will be used in Spur for immediate Characters.

64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are assigned at the moment, for SmallIntegers, Characters, and SmallFloat64s.

If all tag bits in an oop are zero, this denotes a heap object. In this case, the oop does not immediately encode the class and instance data, but instead it identifies a chunk of memory where that information is stored. Such an untagged oop is used as a direct pointer into the heap.

The memory layout of heap objects is specified by the object's class. If you're interested in that layout or the actual assignment of tag bits, read Clement's excellent post:
https://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/

--------------------------------------------------------------

Of course we normally call heap objects "regular objects", and as users we rarely have to care about the distinction anyway. But maybe when we do, explaining it the other way around is actually helpful ...

- Bert -

PS: Another idea would be to distinguish between "register objects" and "memory objects" and explaining it in terms of CPU operations, like I did in my previous attempt. Actually, that may not be such a bad idea?

smime.p7s (5K) Download Attachment

Eliot Miranda-2

Re: [squeak-dev] Immediate and heap objects

Hi Bert,

On Thu, Dec 4, 2014 at 5:05 PM, Bert Freudenberg <[hidden email]> wrote:

I just thought of a unified explanation for immediate and non-immediate objects. It somewhat inverts the notion of "normal", but maybe this way it is easier to understand?

Not bad. Can you repost with corrections? See below:

--------------------------------------------------------------

In Squeak, everything is an "object". Each object has a reference to another object defining its behavior. This is called the object's "class". Many objects can reference the same class object, they are called the class's "instances". In addition to the class reference, an object may hold other data, the so-called "instance data". The interpretation of this data is defined by the class.

Each object is stored in main memory using at least 1 machine word. Different variants of Squeak use either 32 or 64 bit words. For efficiency reasons, the storage format for an object is akin to a "Huffman code", using fewer bits and words for more common kinds of objects.

How exactly the object's bits encode the class and instance data is not visible to the user. The Virtual Machine transparently handles the details and makes all objects appear alike.

Some objects encode both the class reference and instance data in 1 word. These are called "immediate objects".

Most objects do not fit in 1 word. These have a second part dynamically allocated on the heap. They are called "heap objects".

The 1-word first part (the only word in immediates) is called an "oop". It is used to reference an object from another object's instance data.

The oop has some "tag bits" and some data bits.

"The tag bits encode the class, and the data bits encode the instance data."

Incorrect. So perhaps:

"If the object fits in one word and it has a suitable class then the tag bits define the class and the data bits define the instance data. Since there are very few tag bits, the VM only uses this tagged immediate representation for common objects like integers and characters.

If the object doesn't fit in one word the class is stored on the heap in the object's body along with its data. This is so called a heap object."

One special combination of tag bits is reserved to denote heap objects. The other combinations of tag bits correspond to different classes of immediate objects.

32-bit oops have 2 tag bits. This allows four combinations of tag bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate "SmallInteger" instances, which represents signed numbers between -1073741824 and 1073741823. The tag 10 will be used in Spur for immediate Characters.

Can we say the tag 10 *is* used for Characters in Spur? (it is).

64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are assigned at the moment, for SmallIntegers, Characters, and SmallFloat64s.

SmallIntegers have the tag 2r001, Characters have the tag 2r010 and SmallFloat64s have the tag 2r011, leaving four unused tag values.

If all tag bits in an oop are zero, this denotes a heap object. In this case, the oop does not immediately encode the class and instance data, but instead it identifies a chunk of memory where that information is stored. Such an untagged oop is used as a direct pointer into the heap.

The memory layout of heap objects is specified by the object's class. If you're interested in that layout or the actual assignment of tag bits, read Clement's excellent post:
https://clementbera.wordpress.com/2014/01/16/spurs-new-object-format/

--------------------------------------------------------------

Of course we normally call heap objects "regular objects", and as users we rarely have to care about the distinction anyway. But maybe when we do, explaining it the other way around is actually helpful ...

- Bert -

PS: Another idea would be to distinguish between "register objects" and "memory objects" and explaining it in terms of CPU operations, like I did in my previous attempt. Actually, that may not be such a bad idea?

best,

Eliot

Bert Freudenberg

Re: [squeak-dev] Immediate and heap objects

On 05.12.2014, at 06:44, Eliot Miranda <[hidden email]> wrote:

The oop has some "tag bits" and some data bits.

"The tag bits encode the class, and the data bits encode the instance data."

Incorrect. So perhaps:

"If the object fits in one word and it has a suitable class then the tag bits define the class and the data bits define the instance data. Since there are very few tag bits, the VM only uses this tagged immediate representation for common objects like integers and characters.

If the object doesn't fit in one word the class is stored on the heap in the object's body along with its data. This is so called a heap object."

This is more precise, yes, but I think my wording is not wrong, just simpler. The tag bits are what the VM looks at to determine the class. It’s just that if the tag bits say that this is a heap object, then the VM needs to go to the heap to find the actual class. It’s a sort of Huffman code. And on the heap the class reference is again Huffman-coded, at least in regular 32-bit Squeak: There are 5 bits for “compact" classes, or 32 bits for all other classes. But spelling all this out at that point would obscure the argument, I think ...

I agree that a word other than “encode” might be more appropriate. Would that help?

One special combination of tag bits is reserved to denote heap objects. The other combinations of tag bits correspond to different classes of immediate objects.

32-bit oops have 2 tag bits. This allows four combinations of tag bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate "SmallInteger" instances, which represents signed numbers between -1073741824 and 1073741823. The tag 10 will be used in Spur for immediate Characters.

Can we say the tag 10 *is* used for Characters in Spur? (it is).

Oops. Sure. To me Spur is still in the future ;)

64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are assigned at the moment, for SmallIntegers, Characters, and SmallFloat64s.

SmallIntegers have the tag 2r001, Characters have the tag 2r010 and SmallFloat64s have the tag 2r011, leaving four unused tag values.

Well, I didn’t want to go into too many details, but still give an example, which is why I spelled out the tags for 32-bit oops. The exact assignment of tags is not important to understand the basic concept, and Clement’s post has all the details.

- Bert -

smime.p7s (5K) Download Attachment

Ben Coman

Re: [squeak-dev] Immediate and heap objects

Bert Freudenberg wrote:

>
>
>
> ------------------------------------------------------------------------
>
> On 05.12.2014, at 06:44, Eliot Miranda <[hidden email]
> <mailto:[hidden email]>> wrote:
>>
>> The oop has some "tag bits" and some data bits.
>>
>>
>>
>> "The tag bits encode the class, and the data bits encode the
>> instance data."
>>

How about "For immediate objects, the tag bits encode the class..."

>>
>> Incorrect. So perhaps:
>>
>> "If the object fits in one word and it has a suitable class then the
>> tag bits define the class and the data bits define the instance data.
>> Since there are very few tag bits, the VM only uses this tagged
>> immediate representation for common objects like integers and characters.
>>
>> If the object doesn't fit in one word the class is stored on the heap
>> in the object's body along with its data. This is so called a heap
>> object."
>
> This is more precise, yes, but I think my wording is not wrong, just
> simpler. The tag bits are what the VM looks at to determine the class.
> It’s just that if the tag bits say that this is a heap object, then the
> VM needs to go to the heap to find the actual class. It’s a sort of
> Huffman code. And on the heap the class reference is again
> Huffman-coded, at least in regular 32-bit Squeak: There are 5 bits for
> “compact" classes, or 32 bits for all other classes. But spelling all
> this out at that point would obscure the argument, I think ...

I think the reference to Huffman coding is distracting. I probably once
knew what it was, but now I had to look it up and the few pages I looked
at did not add any value in the first few paragraphs.
cheers -ben

>
> I agree that a word other than “encode” might be more appropriate. Would
> that help?
>
>>
>>
>> One special combination of tag bits is reserved to denote heap
>> objects. The other combinations of tag bits correspond to
>> different classes of immediate objects.
>>
>> 32-bit oops have 2 tag bits. This allows four combinations of tag
>> bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate
>> "SmallInteger" instances, which represents signed numbers between
>> -1073741824 and 1073741823. The tag 10 will be used in Spur for
>> immediate Characters.
>>
>>
>> Can we say the tag 10 *is* used for Characters in Spur? (it is).
>
> Oops. Sure. To me Spur is still in the future ;)
>
>> 64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are
>> assigned at the moment, for SmallIntegers, Characters, and
>> SmallFloat64s.
>>
>>
>> SmallIntegers have the tag 2r001, Characters have the tag 2r010 and
>> SmallFloat64s have the tag 2r011, leaving four unused tag values.
>
> Well, I didn’t want to go into too many details, but still give an
> example, which is why I spelled out the tags for 32-bit oops. The exact
> assignment of tags is not important to understand the basic concept, and
> Clement’s post has all the details.
>
> - Bert -
>
>
>

Bert Freudenberg

Re: [squeak-dev] Immediate and heap objects

> I think the reference to Huffman coding is distracting.

I may just call it "variable-length encoding". That's better than referring to Huffman, yes.

- Bert -

smime.p7s (5K) Download Attachment

Ben Coman

Re: [squeak-dev] Immediate and heap objects

Bert Freudenberg wrote:

>
>
>
> ------------------------------------------------------------------------
>
>
>> I think the reference to Huffman coding is distracting.
>
> I may just call it "variable-length encoding". That's better than referring to Huffman, yes.
>
> - Bert -
>
>
>

yes.

Louis LaBrunda

[squeak-dev] Immediate and heap objects

On Fri, 05 Dec 2014 21:41:51 +0800, Ben Coman <[hidden email]> wrote:

>
>Bert Freudenberg wrote:
>> ------------------------------------------------------------------------
>>
>>> I think the reference to Huffman coding is distracting.
>>
>> I may just call it "variable-length encoding". That's better than referring to Huffman, yes.
>>
>> - Bert -
>
>yes.

May put Huffman in parenthesis (Huffman) as it doesn't hurt to encourage a
little learning of the science and history.

Lou
-----------------------------------------------------------
Louis LaBrunda
Keystone Software Corp.
SkypeMe callto://PhotonDemon
mailto:[hidden email] http://www.Keystone-Software.com

Chris Cunnington-4

Re: [squeak-dev] Immediate and heap objects

In reply to this post by Bert Freudenberg

People learn differently, but this description is working for me. And not having to look up what Huffman code means would be a help.

Chris

On Dec 5, 2014, at 7:02 AM, Bert Freudenberg <[hidden email]> wrote:

On 05.12.2014, at 06:44, Eliot Miranda <[hidden email]> wrote:

The oop has some "tag bits" and some data bits.

"The tag bits encode the class, and the data bits encode the instance data."

Incorrect. So perhaps:

"If the object fits in one word and it has a suitable class then the tag bits define the class and the data bits define the instance data. Since there are very few tag bits, the VM only uses this tagged immediate representation for common objects like integers and characters.

If the object doesn't fit in one word the class is stored on the heap in the object's body along with its data. This is so called a heap object."

This is more precise, yes, but I think my wording is not wrong, just simpler. The tag bits are what the VM looks at to determine the class. It’s just that if the tag bits say that this is a heap object, then the VM needs to go to the heap to find the actual class. It’s a sort of Huffman code. And on the heap the class reference is again Huffman-coded, at least in regular 32-bit Squeak: There are 5 bits for “compact" classes, or 32 bits for all other classes. But spelling all this out at that point would obscure the argument, I think ...

I agree that a word other than “encode” might be more appropriate. Would that help?

One special combination of tag bits is reserved to denote heap objects. The other combinations of tag bits correspond to different classes of immediate objects.

32-bit oops have 2 tag bits. This allows four combinations of tag bits (00, 01, 10, 11). The tags 01 and 11 are used for immediate "SmallInteger" instances, which represents signed numbers between -1073741824 and 1073741823. The tag 10 will be used in Spur for immediate Characters.

Can we say the tag 10 *is* used for Characters in Spur? (it is).

Oops. Sure. To me Spur is still in the future ;)

64-bit oops have 3 tag bits in Spur. Only half of the 8 tags are assigned at the moment, for SmallIntegers, Characters, and SmallFloat64s.

SmallIntegers have the tag 2r001, Characters have the tag 2r010 and SmallFloat64s have the tag 2r011, leaving four unused tag values.

Well, I didn’t want to go into too many details, but still give an example, which is why I spelled out the tags for 32-bit oops. The exact assignment of tags is not important to understand the basic concept, and Clement’s post has all the details.

- Bert -