Re: [Pharo-project] Plan/discussion/communication around new object format

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Eliot Miranda-2
 


On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <[hidden email]> wrote:
I would like to be sure that we can have
       - bit for immutable objects
       - bits for experimenting.

There will be quite a few.  And one will be able to steal bits from the class field if one needs fewer classes.  I'm not absolutely sure of the layout yet.  But for example

8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash
20: class index

still leaves 5 bits unused.  And stealing 4 bits each from class index still leaves 64k classes.  So this format is simple and provides lots of unused bits.  The format field is a great idea as it combines a number of orthogonal properties in very few bits.  I don't want to include odd bytes in format because I think a separate field that holds odd bytes and fixed fields is better use of space.  But we can gather statistics before deciding.


Stef

On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:

> Hi guys
>
> Here is an important topic I would like to see discussed so that we see how we can improve and join forces.
> May a mail discussion then a wiki for the summary would be good.
>
>
> stef
>
>
>
> Begin forwarded message:
>
>> From: Eliot Miranda <[hidden email]>
>> Subject: Re: Plan/discussion/communication around new object format
>> Date: May 27, 2012 10:49:54 PM GMT+02:00
>> To: Stéphane Ducasse <[hidden email]>
>>
>>
>>
>> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <[hidden email]> wrote:
>> Hi eliot
>>
>> do you have a description of the new object format you want to introduce?
>>
>> The design is in the class comment of CogMemoryManager in the Cog VMMaker packages.
>>
>> Then what is your schedule?
>>
>> This is difficult. I have made a small start and should be able to spend time on it starting soon.  I want to have it finished by early next year.  But it depends on work schedules etc.
>>
>>
>> I would like to see if we can allocate igor/esteban time before we run out of money
>> to help on that important topic.
>> Now the solution is unclear and I did not see any document where we can evaluate
>> and plan how we can help. So do you want help on that topic? Then how can people
>> contribute if any?
>>
>> The first thing to do is to read the design document, to see if the Pharo community thinks it is the right direction, and to review it, spot deficiencies etc.  So please get those interested to read the class comment of CogMemoryManager in the latest VMMaker.oscog.
>>
>> Here's the current version of it:
>>
>> CogMemoryManager is currently a place-holder for the design of the new Cog VM's object representation and garbage collector.  The goals for the GC are
>>
>> - efficient object representation a la Eliot Miranda's VisualWorks 64-bit object representation that uses a 64-bit header, eliminating direct class references so that all objects refer to their classes indirectly.  Instead the header contains a constant class index, in a field smaller than a full pointer, These class indices are used in inline and first-level method caches, hence they do not have to be updated on GC (although they do have to be traced to be able to GC classes).  Classes are held in a sparse weak table.  The class table needs only to be indexed by an instance's class index in class hierarchy search, in the class primitive, and in tracing live objects in the heap.  The additional header space is allocated to a much expanded identity hash field, reducing hash efficiency problems in identity collections due to the extremely small (11 bit) hash field in the old Squeak GC.  The identity hash field is also a key element of the class index scheme.  A class's identity hash is its index into the class table, so to create an instance of a class one merely copies its identity hash into the class index field of the new instance.  This implies that when classes gain their identity hash they are entered into the class table and their identity hash is that of a previously unused index in the table.  It also implies that there is a maximum number of classes in the table.  At least for a few years 64k classes should be enough.  A class is entered into the class table in the following operations:
>>      behaviorHash
>>      adoptInstance
>>      instantiate
>>      become  (i.e. if an old class becomes a new class)
>>              if target class field's = to original's id hash
>>                 and replacement's id hash is zero
>>                      enter replacement in class table
>> behaviorHash is a special version of identityHash that must be implemented in the image by any object that can function as a class (i.e. Behavior).
>>
>> - more immediate classes.  An immediate Character class would speed up String accessing, especially for WideString, since no instatiation needs to be done on at:put: and no dereference need be done on at:.  In a 32-bit system tag checking is complex since it is thought important to retain 31-bit SmallIntegers.  Hence, as in current Squeak, the least significant bit set implies a SmallInteger, but Characters would likely have a tag pattern of xxx10.  Hence masking with 11 results in two values for SmallInteger, xxx01 and xxx11.  30-bit characters are more than adequate for Unicode.  In a 64-bit system we can use the full three bits and usefully implement an immediate Float.  As in VisualWorks a functional representation takes three bits away from the exponent.  Rotating to put the sign bit in the least significant non-tag bit makes expanding and contracting the 8-bit exponent to the 11-bit IEEE double exponent easy ad makes comparing negative and positive zero easier (an immediate Float is zero if its unsigned 64-bits are < 16).  So the representation looks like
>>      | 8 bit exponent | 52 bit mantissa | sign bit | 3 tag bits |
>> For details see "60-bit immediate Floats" below.
>>
>>
>> - efficient scavenging.  The current Squeak GC uses a slow pointer-reversal collector that writes every field in live objects three times in each collection, twice in the pointer-reversing heap traversal to mark live objects and once to update the pointer to its new location.  A scavenger writes every field of live data twice in each collection, once as it does a block copy of the object when copying to to space, once as it traverses the live pointers in the to space objects.  Of course the block copy is a relatively cheap write.
>>
>> - lazy become.  The JIT's use of inline cacheing provides a cheap way of avoiding scanning the heap as part of a become (which is the simple approach to implementing become in a system with direct pointers).  A becomeForward: on a (set of) non-zero-sized object(s) turns the object into a "corpse" or "forwarding object" whose first (non-header) word/slot is replaced by a pointer to the target of the becomeForward:.  The corpse's class index is set to one that identifies corpses and, because it is a hidden class index, will always fail an inline cache test.  The inline cache failure code is then responsible for following the forwarding pointer chain (these are Iliffe vectors :) ) and resolving to the actual target.  We have yet to determine exactly how this is done (e.g. change the receiver register and/or stack contents and retry the send, perhaps scanning the current activation).  See below on how we deal with becomes on objects with named inst vars.  Note that we probably don't have to worry about zero-sized objects.  These are unlikely to be passed through the FFI (there is nothing to pass :) ) and so will rarely be becommed.  If they do, they can become slowly.  Alternatively we can insist that objects are at least 16 bytes in size (see a8-byte alignment below) so that there will always be space for a forwarding pointer.  Since none of the immediate classes can have non-immediate instances and since we allocate the immediate classes indices corresponding to their tag pattern (SmallInteger = 1, Character = 3, SmallFloat = 4?) we can use all the class indices from 0 to 7 for special uses, 0 = forward, and e.g. 1 = header-sized filler.
>>
>> - pinning.  To support a robust and easy-to-use FFI the memory manager must support temporary pinning where individual objects can be prevented from being moved by the GC for as long as required, either by being one of an in-progress FFI call's arguments, or by having pinning asserted by a primitive (allowing objects to be passed to external code that retains a reference to the object after returning).  Pinning probably implies a per-object "is-pinned" bit in the object header.  Pinning will be done via lazy become; i..e an object in new space will be becommed into a pinned object in old space.  We will only support pinning in old space.
>>
>> - efficient old space collection.  An incremental collector (a la Dijkstra's three colour algorithm) collects old space, e.g. via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  (see free space/free list below)
>>
>> - 8-byte alignment.  It is advantageous for the FFI, for floating-point access, for object movement and for 32/64-bit compatibility to keep object sizes in units of 8 bytes.  For the FFI, 8-byte alignment means passing objects to code that expects that requirement (such as modern x86 numeric processing instructions).  This implies that
>>      - the starts of all spaces are aligned on 8-byte boundaries
>>      - object allocation rounds up the requested size to a multiple of 8 bytes
>>      - the overflow size field is also 8 bytes
>> We shall probably keep the minimum object size at 16 bytes so that there is always room for a forwarding pointer.  But this implies that we will need to implement an 8-byte filler to fill holes between objects > 16 bytes whose length mod 16 bytes is 8 bytes and following pinned objects.  We can do this using a special class index, e.g. 1, so that the method that answers the size of an object looks like, e.g.
>>      chunkSizeOf: oop
>>              <var: #oop type: #'object *'>
>>              ^object classIndex = 1
>>                      ifTrue: [BaseHeaderSize]
>>                      ifFalse: [BaseHeaderSize
>>                                + (object slotSize = OverflowSlotSize
>>                                              ifTrue: [OverflowSizeBytes]
>>                                              ifFalse: [0])
>>                                + (object slotSize * BytesPerSlot)]
>>
>>      chunkStartOf: oop
>>              <var: #oop type: #'object *'>
>>              ^(self cCoerceSimple: oop to: #'char *')
>>                      - ((object classIndex = 1
>>                          or: [object slotSize ~= OverflowSlotSize])
>>                                      ifTrue: [0]
>>                                      ifFalse: [OverflowSizeBytes])
>>
>> For the moment we do not tackle the issue of heap growth and shrinkage with the ability to allocate and deallocate heap segments via memory-mapping.  This technique allows space to be released back to the OS by unmapping empty segments.  We may revisit this but it is not a key requirement for the first implementation.
>>
>> The basic approach is to use a fixed size new space and a growable old space.  The new space is a classic three-space nursery a la Ungar's Generation Scavenging, a large eden for new objects and two smaller survivor spaces that exchange roles on each collection, one being the to space to which surviving objects are copied, the other being the from space of the survivors of the previous collection, i.e. the previous to space.
>>
>> To provide apparent pinning in new space we rely on lazy become.  Since most pinned objects will be byte data and these do not require stack zone activation scanning, the overhead is simply an old space allocation and corpsing.
>>
>> To provide pinning in old space, large objects are implicitly pinned (because it is expensive to move large objects and, because they are both large and relatively rare, they contribute little to overall fragmentation - as in aggregates, small objects can be used to fill-in the spaces between karge objects).  Hence, objects above a particular size are automatically allocated in old space, rather than new space.  Small objects are pinned as per objects in new space, by asserting the pin bit, which will be set automaticaly when allocating a large object.  As a last resort, or by programmer control (the fullGC primitive) old space is collected via mark-sweep (mark-compact) and so the mark phase must build the list of pinned objects around which the sweep/compact phase must carefully step.
>>
>> Free space in old space is organized by a free list/free tree as in Eliot's VisualWorks 5i old space allocator.  There are 64 free lists, indices 1 through 63 holding blocks of space of that size, index 0 holding a semi-balanced ordered tree of free blocks, each node being the head of the list of free blocks of that size.  At the start of the mark phase the free list is thrown away and the sweep phase coallesces free space and steps over pinned objects as it proceeds.  We can reuse the forwarding pointer compaction scheme used in the old collector.  Incremental collections merely move unmarked objects to the free lists (as well as nilling weak pointers in weak arrays and scheduling them for finalization).  The occupancy of the free lists is represented by a bitmap in a 64-bit integer so that an allocation of size 63 or less can know whether there exists a free chunk of that size, but more importantly can know whether a free chunk larger than it exists in the fixed size free lists without having to search all larger free list heads.
>>
>> The incremental collector (a la Dijkstra's three colour algorithm) collects old space via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  [N.B. Not sure how to do this yet.  The incremental collector needs to complete a pass often enough to reclaim objects, but infrequent enough not to waste time.  So some form of feedback should work.  In VisualWorks tracing is broken into quanta or work where image-level code determines the size of a quantum based on how fast the machine is, and how big the heap is.  This code could easily live in the VM, controllable through vmParameterAt:put:.  An alternative would be to use the heartbeat to bound quanta by time.  But in any case some amount of incremental collection would be done on old space allocation and scavenging, the ammount being chosen to keep pause times acceptably short, and at a rate to reclaim old space before a full GC is required, i.e. at a rate proportional to the growth in old space]. The incemental collector is a state machine, being either marking, nilling weak pointers, or freeing.  If nilling weak pointers is not done atomically then there must be a read barrier in weak array at: so that reading from an old space weak array that is holding stale un-nilled references to unmarked objects.  Tricks such as including the weak bit in bounds calculations can make this cheap for non-weak arrays.  Alternatively nilling weak pointers can be made an atomic part of incremental collection, which can be made cheaper by maintaining the set of weak arrays (e.g. on a list).
>>
>> The incremental collector implies a more complex write barrier.  Objects are of three colours, black, having been scanned, grey, being scanned, and white, unreached.  A mark stack holds the grey objects.   If the incremental collector is marking and an unmarked white object is stored into a black object then the stored object must become grey, being added to the mark stack.  So the wrte barrier is essentially
>>      target isYoung ifFalse:
>>              [newValue isYoung
>>                      ifTrue: [target isInRememberedSet ifFalse:
>>                                      [target addToRememberedSet]] "target now refers to a young object; it is a root for scavenges"
>>                      ifFalse:
>>                              [(target isBlack
>>                                and: [igc marking
>>                                and: [newValue isWhite]]) ifTrue:
>>                                      [newValue beGrey]]] "add newValue to IGC's markStack for subsequent scanning"
>>
>> The incremental collector does not detect already marked objects all of whose references have been overwritten by other stores (e.g. in the above if newValue overwrites the sole remaining reference to a marked object).  So the incremental collector only guarantees to collect all garbage created in cycle N at the end of cycle N + 1.  The cost is hence slightly worse memory density but the benefit, provided the IGC works hard enough, is the elimination of long pauses due to full garbage collections, which become actions of last resort or programmer desire.
>>
>> Lazy become.
>>
>> As described earlier the basic idea behind lazy become is to use corpses (forwarding objects) that are followed lazily during GC and inline cache miss.  However, a lazy scheme cannot be used on objects with named inst vars without adding checking to all inst var accesses, which we judge too expensive.  Instead, when becomming objects with named inst vars, we scan all activations in the stack zone, eagerly becomming these references, and we check for corpses when faulting in a context into the stack zone.  Essentially, the invariant is that there are no references to corpses from the receiver slots of stack activations.  A detail is whether we allow or forbid pinning of closure indirection vectors, or scan the entire stack of each activation.  Using a special class index pun for indirection vectors is a cheap way of preventing their becomming/pinning etc.  Although "don't do that" (don't attempt to pin/become indirection vectors) is also an acceptable response.
>>
>> 60-bit immediate Floats
>> Representation for immediate doubles, only used in the 64-bit implementation. Immediate doubles have the same 52 bit mantissa as IEEE double-precision  floating-point, but only have 8 bits of exponent.  So they occupy just less than the middle 1/8th of the double range.  They overlap the normal single-precision floats which also have 8 bit exponents, but exclude the single-precision denormals (exponent-127) and the single-precsion NaNs (exponent +127).  +/- zero is just a pair of values with both exponent and mantissa 0.
>> So the non-zero immediate doubles range from
>>         +/- 0x3800,0000,0000,0001 / 5.8774717541114d-39
>> to      +/- 0x47ff,ffff,ffff,ffff / 6.8056473384188d+38
>> The encoded tagged form has the sign bit moved to the least significant bit, which allows for faster encode/decode because offsetting the exponent can't overflow into the sign bit and because testing for +/- 0 is an unsigned compare for <= 0xf:
>>     msb                                                                                        lsb
>>     [8 exponent subset bits][52 mantissa bits ][1 sign bit][3 tag bits]
>> So assuming the tag is 5, the tagged non-zero bit patterns are
>>              0x0000,0000,0000,001[d/5]
>> to           0xffff,ffff,ffff,fff[d/5]
>> and +/- 0d is 0x0000,0000,0000,000[d/5]
>> Encode/decode of non-zero values in machine code looks like:
>>                                              msb                                              lsb
>> Decode:                              [8expsubset][52mantissa][1s][3tags]
>> shift away tags:                     [ 000 ][8expsubset][52mantissa][1s]
>> add exponent offset: [     11 exponent     ][52mantissa][1s]
>> rot sign:                            [1s][     11 exponent     ][52mantissa]
>>
>> Encode:                                      [1s][     11 exponent     ][52mantissa]
>> rot sign:                            [     11 exponent     ][52mantissa][1s]
>> sub exponent offset: [ 000 ][8expsubset][52 mantissa][1s]
>> shift:                                       [8expsubset][52 mantissa][1s][ 000 ]
>> or/add tags:                 [8expsubset][52mantissa][1s][3tags]
>> but is slower in C because
>> a) there is no rotate, and
>> b) raw conversion between double and quadword must (at least in the source) move bits through memory ( quadword = *(q64 *)&doubleVariable).
>>
>> Issues:
>> How do we avoid the Size4Bit for 64-bits?  The format word encodes the number of odd bytes, but currently has only 4 bits and hence only supports odd bytes of 0 - 3.  We need odd bytes of 0 - 7.  But I don't like the separate Size4Bit.  Best to change the VI code and have a 5 bit format?  We loose one bit but save two bits (isEphemeron and isWeak (or three, if isPointers)) for a net gain of one (or two)
>>
>> Further, keep Squeak's format idea or go for separate bits?  For 64-bits we need a 5 bit format field.  This contrasts with isPointers, isWeak, isEphemeron, 3 odd size bits (or byte size)..  format field is quite economical.
>>
>> Are class indices in inline caches strong references to classes or weak references?
>> If strong then they must be scanned during GC and the methodZone must be flushed on fullGC to reclaim all classes (this looks to be a bug in the V3 Cogit).
>> If weak then when the class table loses references, PICs containing freed classes must be freed and then sends to freed PICs or containing freed clases must be unlinked.
>> The second approach is faster; the common case is scanning the class table, the uncommon case is freeing classes.  The second approach is better; in-line caches do not prevent reclamation of classes.
>>
>>
>> Stef
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>





--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Mariano Martinez Peck
 


On Wed, May 30, 2012 at 10:22 PM, Eliot Miranda <[hidden email]> wrote:
 


On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <[hidden email]> wrote:
I would like to be sure that we can have
       - bit for immutable objects
       - bits for experimenting.

There will be quite a few.  And one will be able to steal bits from the class field if one needs fewer classes.  I'm not absolutely sure of the layout yet.  But for example

8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash

and we can make it lazy, that is, we compute it not at instantiation time but rather the first time the primitive is call.

 
20: class index

This would probably work for a while. I think that it would be good to let an "open door" so that in the future we can just add one more word for a class pointer.
 

still leaves 5 bits unused.  And stealing 4 bits each from class index still leaves 64k classes.  So this format is simple and provides lots of unused bits.  The format field is a great idea as it combines a number of orthogonal properties in very few bits.  I don't want to include odd bytes in format because I think a separate field that holds odd bytes and fixed fields is better use of space.  But we can gather statistics before deciding.


Stef

On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:

> Hi guys
>
> Here is an important topic I would like to see discussed so that we see how we can improve and join forces.
> May a mail discussion then a wiki for the summary would be good.
>
>
> stef
>
>
>
> Begin forwarded message:
>
>> From: Eliot Miranda <[hidden email]>
>> Subject: Re: Plan/discussion/communication around new object format
>> Date: May 27, 2012 10:49:54 PM GMT+02:00
>> To: Stéphane Ducasse <[hidden email]>
>>
>>
>>
>> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <[hidden email]> wrote:
>> Hi eliot
>>
>> do you have a description of the new object format you want to introduce?
>>
>> The design is in the class comment of CogMemoryManager in the Cog VMMaker packages.
>>
>> Then what is your schedule?
>>
>> This is difficult. I have made a small start and should be able to spend time on it starting soon.  I want to have it finished by early next year.  But it depends on work schedules etc.
>>
>>
>> I would like to see if we can allocate igor/esteban time before we run out of money
>> to help on that important topic.
>> Now the solution is unclear and I did not see any document where we can evaluate
>> and plan how we can help. So do you want help on that topic? Then how can people
>> contribute if any?
>>
>> The first thing to do is to read the design document, to see if the Pharo community thinks it is the right direction, and to review it, spot deficiencies etc.  So please get those interested to read the class comment of CogMemoryManager in the latest VMMaker.oscog.
>>
>> Here's the current version of it:
>>
>> CogMemoryManager is currently a place-holder for the design of the new Cog VM's object representation and garbage collector.  The goals for the GC are
>>
>> - efficient object representation a la Eliot Miranda's VisualWorks 64-bit object representation that uses a 64-bit header, eliminating direct class references so that all objects refer to their classes indirectly.  Instead the header contains a constant class index, in a field smaller than a full pointer, These class indices are used in inline and first-level method caches, hence they do not have to be updated on GC (although they do have to be traced to be able to GC classes).  Classes are held in a sparse weak table.  The class table needs only to be indexed by an instance's class index in class hierarchy search, in the class primitive, and in tracing live objects in the heap.  The additional header space is allocated to a much expanded identity hash field, reducing hash efficiency problems in identity collections due to the extremely small (11 bit) hash field in the old Squeak GC.  The identity hash field is also a key element of the class index scheme.  A class's identity hash is its index into the class table, so to create an instance of a class one merely copies its identity hash into the class index field of the new instance.  This implies that when classes gain their identity hash they are entered into the class table and their identity hash is that of a previously unused index in the table.  It also implies that there is a maximum number of classes in the table.  At least for a few years 64k classes should be enough.  A class is entered into the class table in the following operations:
>>      behaviorHash
>>      adoptInstance
>>      instantiate
>>      become  (i.e. if an old class becomes a new class)
>>              if target class field's = to original's id hash
>>                 and replacement's id hash is zero
>>                      enter replacement in class table
>> behaviorHash is a special version of identityHash that must be implemented in the image by any object that can function as a class (i.e. Behavior).
>>
>> - more immediate classes.  An immediate Character class would speed up String accessing, especially for WideString, since no instatiation needs to be done on at:put: and no dereference need be done on at:.  In a 32-bit system tag checking is complex since it is thought important to retain 31-bit SmallIntegers.  Hence, as in current Squeak, the least significant bit set implies a SmallInteger, but Characters would likely have a tag pattern of xxx10.  Hence masking with 11 results in two values for SmallInteger, xxx01 and xxx11.  30-bit characters are more than adequate for Unicode.  In a 64-bit system we can use the full three bits and usefully implement an immediate Float.  As in VisualWorks a functional representation takes three bits away from the exponent.  Rotating to put the sign bit in the least significant non-tag bit makes expanding and contracting the 8-bit exponent to the 11-bit IEEE double exponent easy ad makes comparing negative and positive zero easier (an immediate Float is zero if its unsigned 64-bits are < 16).  So the representation looks like
>>      | 8 bit exponent | 52 bit mantissa | sign bit | 3 tag bits |
>> For details see "60-bit immediate Floats" below.
>>
>>
>> - efficient scavenging.  The current Squeak GC uses a slow pointer-reversal collector that writes every field in live objects three times in each collection, twice in the pointer-reversing heap traversal to mark live objects and once to update the pointer to its new location.  A scavenger writes every field of live data twice in each collection, once as it does a block copy of the object when copying to to space, once as it traverses the live pointers in the to space objects.  Of course the block copy is a relatively cheap write.
>>
>> - lazy become.  The JIT's use of inline cacheing provides a cheap way of avoiding scanning the heap as part of a become (which is the simple approach to implementing become in a system with direct pointers).  A becomeForward: on a (set of) non-zero-sized object(s) turns the object into a "corpse" or "forwarding object" whose first (non-header) word/slot is replaced by a pointer to the target of the becomeForward:.  The corpse's class index is set to one that identifies corpses and, because it is a hidden class index, will always fail an inline cache test.  The inline cache failure code is then responsible for following the forwarding pointer chain (these are Iliffe vectors :) ) and resolving to the actual target.  We have yet to determine exactly how this is done (e.g. change the receiver register and/or stack contents and retry the send, perhaps scanning the current activation).  See below on how we deal with becomes on objects with named inst vars.  Note that we probably don't have to worry about zero-sized objects.  These are unlikely to be passed through the FFI (there is nothing to pass :) ) and so will rarely be becommed.  If they do, they can become slowly.  Alternatively we can insist that objects are at least 16 bytes in size (see a8-byte alignment below) so that there will always be space for a forwarding pointer.  Since none of the immediate classes can have non-immediate instances and since we allocate the immediate classes indices corresponding to their tag pattern (SmallInteger = 1, Character = 3, SmallFloat = 4?) we can use all the class indices from 0 to 7 for special uses, 0 = forward, and e.g. 1 = header-sized filler.
>>
>> - pinning.  To support a robust and easy-to-use FFI the memory manager must support temporary pinning where individual objects can be prevented from being moved by the GC for as long as required, either by being one of an in-progress FFI call's arguments, or by having pinning asserted by a primitive (allowing objects to be passed to external code that retains a reference to the object after returning).  Pinning probably implies a per-object "is-pinned" bit in the object header.  Pinning will be done via lazy become; i..e an object in new space will be becommed into a pinned object in old space.  We will only support pinning in old space.
>>
>> - efficient old space collection.  An incremental collector (a la Dijkstra's three colour algorithm) collects old space, e.g. via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  (see free space/free list below)
>>
>> - 8-byte alignment.  It is advantageous for the FFI, for floating-point access, for object movement and for <a href="tel:32%2F64" value="+333264" target="_blank">32/64-bit compatibility to keep object sizes in units of 8 bytes.  For the FFI, 8-byte alignment means passing objects to code that expects that requirement (such as modern x86 numeric processing instructions).  This implies that
>>      - the starts of all spaces are aligned on 8-byte boundaries
>>      - object allocation rounds up the requested size to a multiple of 8 bytes
>>      - the overflow size field is also 8 bytes
>> We shall probably keep the minimum object size at 16 bytes so that there is always room for a forwarding pointer.  But this implies that we will need to implement an 8-byte filler to fill holes between objects > 16 bytes whose length mod 16 bytes is 8 bytes and following pinned objects.  We can do this using a special class index, e.g. 1, so that the method that answers the size of an object looks like, e.g.
>>      chunkSizeOf: oop
>>              <var: #oop type: #'object *'>
>>              ^object classIndex = 1
>>                      ifTrue: [BaseHeaderSize]
>>                      ifFalse: [BaseHeaderSize
>>                                + (object slotSize = OverflowSlotSize
>>                                              ifTrue: [OverflowSizeBytes]
>>                                              ifFalse: [0])
>>                                + (object slotSize * BytesPerSlot)]
>>
>>      chunkStartOf: oop
>>              <var: #oop type: #'object *'>
>>              ^(self cCoerceSimple: oop to: #'char *')
>>                      - ((object classIndex = 1
>>                          or: [object slotSize ~= OverflowSlotSize])
>>                                      ifTrue: [0]
>>                                      ifFalse: [OverflowSizeBytes])
>>
>> For the moment we do not tackle the issue of heap growth and shrinkage with the ability to allocate and deallocate heap segments via memory-mapping.  This technique allows space to be released back to the OS by unmapping empty segments.  We may revisit this but it is not a key requirement for the first implementation.
>>
>> The basic approach is to use a fixed size new space and a growable old space.  The new space is a classic three-space nursery a la Ungar's Generation Scavenging, a large eden for new objects and two smaller survivor spaces that exchange roles on each collection, one being the to space to which surviving objects are copied, the other being the from space of the survivors of the previous collection, i.e. the previous to space.
>>
>> To provide apparent pinning in new space we rely on lazy become.  Since most pinned objects will be byte data and these do not require stack zone activation scanning, the overhead is simply an old space allocation and corpsing.
>>
>> To provide pinning in old space, large objects are implicitly pinned (because it is expensive to move large objects and, because they are both large and relatively rare, they contribute little to overall fragmentation - as in aggregates, small objects can be used to fill-in the spaces between karge objects).  Hence, objects above a particular size are automatically allocated in old space, rather than new space.  Small objects are pinned as per objects in new space, by asserting the pin bit, which will be set automaticaly when allocating a large object.  As a last resort, or by programmer control (the fullGC primitive) old space is collected via mark-sweep (mark-compact) and so the mark phase must build the list of pinned objects around which the sweep/compact phase must carefully step.
>>
>> Free space in old space is organized by a free list/free tree as in Eliot's VisualWorks 5i old space allocator.  There are 64 free lists, indices 1 through 63 holding blocks of space of that size, index 0 holding a semi-balanced ordered tree of free blocks, each node being the head of the list of free blocks of that size.  At the start of the mark phase the free list is thrown away and the sweep phase coallesces free space and steps over pinned objects as it proceeds.  We can reuse the forwarding pointer compaction scheme used in the old collector.  Incremental collections merely move unmarked objects to the free lists (as well as nilling weak pointers in weak arrays and scheduling them for finalization).  The occupancy of the free lists is represented by a bitmap in a 64-bit integer so that an allocation of size 63 or less can know whether there exists a free chunk of that size, but more importantly can know whether a free chunk larger than it exists in the fixed size free lists without having to search all larger free list heads.
>>
>> The incremental collector (a la Dijkstra's three colour algorithm) collects old space via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  [N.B. Not sure how to do this yet.  The incremental collector needs to complete a pass often enough to reclaim objects, but infrequent enough not to waste time.  So some form of feedback should work.  In VisualWorks tracing is broken into quanta or work where image-level code determines the size of a quantum based on how fast the machine is, and how big the heap is.  This code could easily live in the VM, controllable through vmParameterAt:put:.  An alternative would be to use the heartbeat to bound quanta by time.  But in any case some amount of incremental collection would be done on old space allocation and scavenging, the ammount being chosen to keep pause times acceptably short, and at a rate to reclaim old space before a full GC is required, i.e. at a rate proportional to the growth in old space]. The incemental collector is a state machine, being either marking, nilling weak pointers, or freeing.  If nilling weak pointers is not done atomically then there must be a read barrier in weak array at: so that reading from an old space weak array that is holding stale un-nilled references to unmarked objects.  Tricks such as including the weak bit in bounds calculations can make this cheap for non-weak arrays.  Alternatively nilling weak pointers can be made an atomic part of incremental collection, which can be made cheaper by maintaining the set of weak arrays (e.g. on a list).
>>
>> The incremental collector implies a more complex write barrier.  Objects are of three colours, black, having been scanned, grey, being scanned, and white, unreached.  A mark stack holds the grey objects.   If the incremental collector is marking and an unmarked white object is stored into a black object then the stored object must become grey, being added to the mark stack.  So the wrte barrier is essentially
>>      target isYoung ifFalse:
>>              [newValue isYoung
>>                      ifTrue: [target isInRememberedSet ifFalse:
>>                                      [target addToRememberedSet]] "target now refers to a young object; it is a root for scavenges"
>>                      ifFalse:
>>                              [(target isBlack
>>                                and: [igc marking
>>                                and: [newValue isWhite]]) ifTrue:
>>                                      [newValue beGrey]]] "add newValue to IGC's markStack for subsequent scanning"
>>
>> The incremental collector does not detect already marked objects all of whose references have been overwritten by other stores (e.g. in the above if newValue overwrites the sole remaining reference to a marked object).  So the incremental collector only guarantees to collect all garbage created in cycle N at the end of cycle N + 1.  The cost is hence slightly worse memory density but the benefit, provided the IGC works hard enough, is the elimination of long pauses due to full garbage collections, which become actions of last resort or programmer desire.
>>
>> Lazy become.
>>
>> As described earlier the basic idea behind lazy become is to use corpses (forwarding objects) that are followed lazily during GC and inline cache miss.  However, a lazy scheme cannot be used on objects with named inst vars without adding checking to all inst var accesses, which we judge too expensive.  Instead, when becomming objects with named inst vars, we scan all activations in the stack zone, eagerly becomming these references, and we check for corpses when faulting in a context into the stack zone.  Essentially, the invariant is that there are no references to corpses from the receiver slots of stack activations.  A detail is whether we allow or forbid pinning of closure indirection vectors, or scan the entire stack of each activation.  Using a special class index pun for indirection vectors is a cheap way of preventing their becomming/pinning etc.  Although "don't do that" (don't attempt to pin/become indirection vectors) is also an acceptable response.
>>
>> 60-bit immediate Floats
>> Representation for immediate doubles, only used in the 64-bit implementation. Immediate doubles have the same 52 bit mantissa as IEEE double-precision  floating-point, but only have 8 bits of exponent.  So they occupy just less than the middle 1/8th of the double range.  They overlap the normal single-precision floats which also have 8 bit exponents, but exclude the single-precision denormals (exponent-127) and the single-precsion NaNs (exponent +127).  +/- zero is just a pair of values with both exponent and mantissa 0.
>> So the non-zero immediate doubles range from
>>         +/- 0x3800,0000,0000,0001 / 5.8774717541114d-39
>> to      +/- 0x47ff,ffff,ffff,ffff / 6.8056473384188d+38
>> The encoded tagged form has the sign bit moved to the least significant bit, which allows for faster encode/decode because offsetting the exponent can't overflow into the sign bit and because testing for +/- 0 is an unsigned compare for <= 0xf:
>>     msb                                                                                        lsb
>>     [8 exponent subset bits][52 mantissa bits ][1 sign bit][3 tag bits]
>> So assuming the tag is 5, the tagged non-zero bit patterns are
>>              0x0000,0000,0000,001[d/5]
>> to           0xffff,ffff,ffff,fff[d/5]
>> and +/- 0d is 0x0000,0000,0000,000[d/5]
>> Encode/decode of non-zero values in machine code looks like:
>>                                              msb                                              lsb
>> Decode:                              [8expsubset][52mantissa][1s][3tags]
>> shift away tags:                     [ 000 ][8expsubset][52mantissa][1s]
>> add exponent offset: [     11 exponent     ][52mantissa][1s]
>> rot sign:                            [1s][     11 exponent     ][52mantissa]
>>
>> Encode:                                      [1s][     11 exponent     ][52mantissa]
>> rot sign:                            [     11 exponent     ][52mantissa][1s]
>> sub exponent offset: [ 000 ][8expsubset][52 mantissa][1s]
>> shift:                                       [8expsubset][52 mantissa][1s][ 000 ]
>> or/add tags:                 [8expsubset][52mantissa][1s][3tags]
>> but is slower in C because
>> a) there is no rotate, and
>> b) raw conversion between double and quadword must (at least in the source) move bits through memory ( quadword = *(q64 *)&doubleVariable).
>>
>> Issues:
>> How do we avoid the Size4Bit for 64-bits?  The format word encodes the number of odd bytes, but currently has only 4 bits and hence only supports odd bytes of 0 - 3.  We need odd bytes of 0 - 7.  But I don't like the separate Size4Bit.  Best to change the VI code and have a 5 bit format?  We loose one bit but save two bits (isEphemeron and isWeak (or three, if isPointers)) for a net gain of one (or two)
>>
>> Further, keep Squeak's format idea or go for separate bits?  For 64-bits we need a 5 bit format field.  This contrasts with isPointers, isWeak, isEphemeron, 3 odd size bits (or byte size)..  format field is quite economical.
>>
>> Are class indices in inline caches strong references to classes or weak references?
>> If strong then they must be scanned during GC and the methodZone must be flushed on fullGC to reclaim all classes (this looks to be a bug in the V3 Cogit).
>> If weak then when the class table loses references, PICs containing freed classes must be freed and then sends to freed PICs or containing freed clases must be unlinked.
>> The second approach is faster; the common case is scanning the class table, the uncommon case is freeing classes.  The second approach is better; in-line caches do not prevent reclamation of classes.
>>
>>
>> Stef
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>





--
best,
Eliot





--
Mariano
http://marianopeck.wordpress.com

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Stefan Marr-3
In reply to this post by Eliot Miranda-2

Hi Eliot:

From my experience with the RoarVM, it seems to be a rather simple exercise to enable the VM to support a custom 'pre-header' for objects.
That is, a constant offset in the memory that comes from the allocator, and is normally ignored by the GC.

That allows me to do all kind of stuff. Of course at a cost of a word per object, and at the cost of recompiling the VM.
But, that should be a reasonable price to pay for someone doing research on these kind of things.

Sometimes a few bits are just not enough, and such a pre-header gives much much more flexibility.
For the people interested in that, I could dig out the details (I think, I did that already once on this list).

Best regards
Stefan


On 30 May 2012, at 22:22, Eliot Miranda wrote:

>
>
> On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <[hidden email]> wrote:
> I would like to be sure that we can have
>        - bit for immutable objects
>        - bits for experimenting.
>
> There will be quite a few.  And one will be able to steal bits from the class field if one needs fewer classes.  I'm not absolutely sure of the layout yet.  But for example
>
> 8: slot size (255 => extra header word with large size)
> 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for pointer, 7 => # fixed fields is in the class)
> 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable, compiled method, ephemerons, weak, etc)
> 1: immutability
> 3: GC 2 mark bits. 1 forwarded bit
> 20: identity hash
> 20: class index
>
> still leaves 5 bits unused.  And stealing 4 bits each from class index still leaves 64k classes.  So this format is simple and provides lots of unused bits.  The format field is a great idea as it combines a number of orthogonal properties in very few bits.  I don't want to include odd bytes in format because I think a separate field that holds odd bytes and fixed fields is better use of space.  But we can gather statistics before deciding.
>
>
> Stef
>
> On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:
>
> > Hi guys
> >
> > Here is an important topic I would like to see discussed so that we see how we can improve and join forces.
> > May a mail discussion then a wiki for the summary would be good.
> >
> >
> > stef
> >
> >
> >
> > Begin forwarded message:
> >
> >> From: Eliot Miranda <[hidden email]>
> >> Subject: Re: Plan/discussion/communication around new object format
> >> Date: May 27, 2012 10:49:54 PM GMT+02:00
> >> To: Stéphane Ducasse <[hidden email]>
> >>
> >>
> >>
> >> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <[hidden email]> wrote:
> >> Hi eliot
> >>
> >> do you have a description of the new object format you want to introduce?
> >>
> >> The design is in the class comment of CogMemoryManager in the Cog VMMaker packages.
> >>
> >> Then what is your schedule?
> >>
> >> This is difficult. I have made a small start and should be able to spend time on it starting soon.  I want to have it finished by early next year.  But it depends on work schedules etc.
> >>
> >>
> >> I would like to see if we can allocate igor/esteban time before we run out of money
> >> to help on that important topic.
> >> Now the solution is unclear and I did not see any document where we can evaluate
> >> and plan how we can help. So do you want help on that topic? Then how can people
> >> contribute if any?
> >>
> >> The first thing to do is to read the design document, to see if the Pharo community thinks it is the right direction, and to review it, spot deficiencies etc.  So please get those interested to read the class comment of CogMemoryManager in the latest VMMaker.oscog.
> >>
> >> Here's the current version of it:
> >>
> >> CogMemoryManager is currently a place-holder for the design of the new Cog VM's object representation and garbage collector.  The goals for the GC are
> >>
> >> - efficient object representation a la Eliot Miranda's VisualWorks 64-bit object representation that uses a 64-bit header, eliminating direct class references so that all objects refer to their classes indirectly.  Instead the header contains a constant class index, in a field smaller than a full pointer, These class indices are used in inline and first-level method caches, hence they do not have to be updated on GC (although they do have to be traced to be able to GC classes).  Classes are held in a sparse weak table.  The class table needs only to be indexed by an instance's class index in class hierarchy search, in the class primitive, and in tracing live objects in the heap.  The additional header space is allocated to a much expanded identity hash field, reducing hash efficiency problems in identity collections due to the extremely small (11 bit) hash field in the old Squeak GC.  The identity hash field is also a key element of the class index scheme.  A class's identity hash is its index into the class table, so to create an instance of a class one merely copies its identity hash into the class index field of the new instance.  This implies that when classes gain their identity hash they are entered into the class table and their identity hash is that of a previously unused index in the table.  It also implies that there is a maximum number of classes in the table.  At least for a few years 64k classes should be enough.  A class is entered into the class table in the following operations:
> >>      behaviorHash
> >>      adoptInstance
> >>      instantiate
> >>      become  (i.e. if an old class becomes a new class)
> >>              if target class field's = to original's id hash
> >>                 and replacement's id hash is zero
> >>                      enter replacement in class table
> >> behaviorHash is a special version of identityHash that must be implemented in the image by any object that can function as a class (i.e. Behavior).
> >>
> >> - more immediate classes.  An immediate Character class would speed up String accessing, especially for WideString, since no instatiation needs to be done on at:put: and no dereference need be done on at:.  In a 32-bit system tag checking is complex since it is thought important to retain 31-bit SmallIntegers.  Hence, as in current Squeak, the least significant bit set implies a SmallInteger, but Characters would likely have a tag pattern of xxx10.  Hence masking with 11 results in two values for SmallInteger, xxx01 and xxx11.  30-bit characters are more than adequate for Unicode.  In a 64-bit system we can use the full three bits and usefully implement an immediate Float.  As in VisualWorks a functional representation takes three bits away from the exponent.  Rotating to put the sign bit in the least significant non-tag bit makes expanding and contracting the 8-bit exponent to the 11-bit IEEE double exponent easy ad makes comparing negative and positive zero easier (an immediate Float is zero if its unsigned 64-bits are < 16).  So the representation looks like
> >>      | 8 bit exponent | 52 bit mantissa | sign bit | 3 tag bits |
> >> For details see "60-bit immediate Floats" below.
> >>
> >>
> >> - efficient scavenging.  The current Squeak GC uses a slow pointer-reversal collector that writes every field in live objects three times in each collection, twice in the pointer-reversing heap traversal to mark live objects and once to update the pointer to its new location.  A scavenger writes every field of live data twice in each collection, once as it does a block copy of the object when copying to to space, once as it traverses the live pointers in the to space objects.  Of course the block copy is a relatively cheap write.
> >>
> >> - lazy become.  The JIT's use of inline cacheing provides a cheap way of avoiding scanning the heap as part of a become (which is the simple approach to implementing become in a system with direct pointers).  A becomeForward: on a (set of) non-zero-sized object(s) turns the object into a "corpse" or "forwarding object" whose first (non-header) word/slot is replaced by a pointer to the target of the becomeForward:.  The corpse's class index is set to one that identifies corpses and, because it is a hidden class index, will always fail an inline cache test.  The inline cache failure code is then responsible for following the forwarding pointer chain (these are Iliffe vectors :) ) and resolving to the actual target.  We have yet to determine exactly how this is done (e.g. change the receiver register and/or stack contents and retry the send, perhaps scanning the current activation).  See below on how we deal with becomes on objects with named inst vars.  Note that we probably don't have to worry about zero-sized objects.  These are unlikely to be passed through the FFI (there is nothing to pass :) ) and so will rarely be becommed.  If they do, they can become slowly.  Alternatively we can insist that objects are at least 16 bytes in size (see a8-byte alignment below) so that there will always be space for a forwarding pointer.  Since none of the immediate classes can have non-immediate instances and since we allocate the immediate classes indices corresponding to their tag pattern (SmallInteger = 1, Character = 3, SmallFloat = 4?) we can use all the class indices from 0 to 7 for special uses, 0 = forward, and e.g. 1 = header-sized filler.
> >>
> >> - pinning.  To support a robust and easy-to-use FFI the memory manager must support temporary pinning where individual objects can be prevented from being moved by the GC for as long as required, either by being one of an in-progress FFI call's arguments, or by having pinning asserted by a primitive (allowing objects to be passed to external code that retains a reference to the object after returning).  Pinning probably implies a per-object "is-pinned" bit in the object header.  Pinning will be done via lazy become; i..e an object in new space will be becommed into a pinned object in old space.  We will only support pinning in old space.
> >>
> >> - efficient old space collection.  An incremental collector (a la Dijkstra's three colour algorithm) collects old space, e.g. via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  (see free space/free list below)
> >>
> >> - 8-byte alignment.  It is advantageous for the FFI, for floating-point access, for object movement and for 32/64-bit compatibility to keep object sizes in units of 8 bytes.  For the FFI, 8-byte alignment means passing objects to code that expects that requirement (such as modern x86 numeric processing instructions).  This implies that
> >>      - the starts of all spaces are aligned on 8-byte boundaries
> >>      - object allocation rounds up the requested size to a multiple of 8 bytes
> >>      - the overflow size field is also 8 bytes
> >> We shall probably keep the minimum object size at 16 bytes so that there is always room for a forwarding pointer.  But this implies that we will need to implement an 8-byte filler to fill holes between objects > 16 bytes whose length mod 16 bytes is 8 bytes and following pinned objects.  We can do this using a special class index, e.g. 1, so that the method that answers the size of an object looks like, e.g.
> >>      chunkSizeOf: oop
> >>              <var: #oop type: #'object *'>
> >>              ^object classIndex = 1
> >>                      ifTrue: [BaseHeaderSize]
> >>                      ifFalse: [BaseHeaderSize
> >>                                + (object slotSize = OverflowSlotSize
> >>                                              ifTrue: [OverflowSizeBytes]
> >>                                              ifFalse: [0])
> >>                                + (object slotSize * BytesPerSlot)]
> >>
> >>      chunkStartOf: oop
> >>              <var: #oop type: #'object *'>
> >>              ^(self cCoerceSimple: oop to: #'char *')
> >>                      - ((object classIndex = 1
> >>                          or: [object slotSize ~= OverflowSlotSize])
> >>                                      ifTrue: [0]
> >>                                      ifFalse: [OverflowSizeBytes])
> >>
> >> For the moment we do not tackle the issue of heap growth and shrinkage with the ability to allocate and deallocate heap segments via memory-mapping.  This technique allows space to be released back to the OS by unmapping empty segments.  We may revisit this but it is not a key requirement for the first implementation.
> >>
> >> The basic approach is to use a fixed size new space and a growable old space.  The new space is a classic three-space nursery a la Ungar's Generation Scavenging, a large eden for new objects and two smaller survivor spaces that exchange roles on each collection, one being the to space to which surviving objects are copied, the other being the from space of the survivors of the previous collection, i.e. the previous to space.
> >>
> >> To provide apparent pinning in new space we rely on lazy become.  Since most pinned objects will be byte data and these do not require stack zone activation scanning, the overhead is simply an old space allocation and corpsing.
> >>
> >> To provide pinning in old space, large objects are implicitly pinned (because it is expensive to move large objects and, because they are both large and relatively rare, they contribute little to overall fragmentation - as in aggregates, small objects can be used to fill-in the spaces between karge objects).  Hence, objects above a particular size are automatically allocated in old space, rather than new space.  Small objects are pinned as per objects in new space, by asserting the pin bit, which will be set automaticaly when allocating a large object.  As a last resort, or by programmer control (the fullGC primitive) old space is collected via mark-sweep (mark-compact) and so the mark phase must build the list of pinned objects around which the sweep/compact phase must carefully step.
> >>
> >> Free space in old space is organized by a free list/free tree as in Eliot's VisualWorks 5i old space allocator.  There are 64 free lists, indices 1 through 63 holding blocks of space of that size, index 0 holding a semi-balanced ordered tree of free blocks, each node being the head of the list of free blocks of that size.  At the start of the mark phase the free list is thrown away and the sweep phase coallesces free space and steps over pinned objects as it proceeds.  We can reuse the forwarding pointer compaction scheme used in the old collector.  Incremental collections merely move unmarked objects to the free lists (as well as nilling weak pointers in weak arrays and scheduling them for finalization).  The occupancy of the free lists is represented by a bitmap in a 64-bit integer so that an allocation of size 63 or less can know whether there exists a free chunk of that size, but more importantly can know whether a free chunk larger than it exists in the fixed size free lists without having to search all larger free list heads.
> >>
> >> The incremental collector (a la Dijkstra's three colour algorithm) collects old space via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  [N.B. Not sure how to do this yet.  The incremental collector needs to complete a pass often enough to reclaim objects, but infrequent enough not to waste time.  So some form of feedback should work.  In VisualWorks tracing is broken into quanta or work where image-level code determines the size of a quantum based on how fast the machine is, and how big the heap is.  This code could easily live in the VM, controllable through vmParameterAt:put:.  An alternative would be to use the heartbeat to bound quanta by time.  But in any case some amount of incremental collection would be done on old space allocation and scavenging, the ammount being chosen to keep pause times acceptably short, and at a rate to reclaim old space before a full GC is required, i.e. at a rate proportional to the growth in old space]. The incemental collector is a state machine, being either marking, nilling weak pointers, or freeing.  If nilling weak pointers is not done atomically then there must be a read barrier in weak array at: so that reading from an old space weak array that is holding stale un-nilled references to unmarked objects.  Tricks such as including the weak bit in bounds calculations can make this cheap for non-weak arrays.  Alternatively nilling weak pointers can be made an atomic part of incremental collection, which can be made cheaper by maintaining the set of weak arrays (e.g. on a list).
> >>
> >> The incremental collector implies a more complex write barrier.  Objects are of three colours, black, having been scanned, grey, being scanned, and white, unreached.  A mark stack holds the grey objects.   If the incremental collector is marking and an unmarked white object is stored into a black object then the stored object must become grey, being added to the mark stack.  So the wrte barrier is essentially
> >>      target isYoung ifFalse:
> >>              [newValue isYoung
> >>                      ifTrue: [target isInRememberedSet ifFalse:
> >>                                      [target addToRememberedSet]] "target now refers to a young object; it is a root for scavenges"
> >>                      ifFalse:
> >>                              [(target isBlack
> >>                                and: [igc marking
> >>                                and: [newValue isWhite]]) ifTrue:
> >>                                      [newValue beGrey]]] "add newValue to IGC's markStack for subsequent scanning"
> >>
> >> The incremental collector does not detect already marked objects all of whose references have been overwritten by other stores (e.g. in the above if newValue overwrites the sole remaining reference to a marked object).  So the incremental collector only guarantees to collect all garbage created in cycle N at the end of cycle N + 1.  The cost is hence slightly worse memory density but the benefit, provided the IGC works hard enough, is the elimination of long pauses due to full garbage collections, which become actions of last resort or programmer desire.
> >>
> >> Lazy become.
> >>
> >> As described earlier the basic idea behind lazy become is to use corpses (forwarding objects) that are followed lazily during GC and inline cache miss.  However, a lazy scheme cannot be used on objects with named inst vars without adding checking to all inst var accesses, which we judge too expensive.  Instead, when becomming objects with named inst vars, we scan all activations in the stack zone, eagerly becomming these references, and we check for corpses when faulting in a context into the stack zone.  Essentially, the invariant is that there are no references to corpses from the receiver slots of stack activations.  A detail is whether we allow or forbid pinning of closure indirection vectors, or scan the entire stack of each activation.  Using a special class index pun for indirection vectors is a cheap way of preventing their becomming/pinning etc.  Although "don't do that" (don't attempt to pin/become indirection vectors) is also an acceptable response.
> >>
> >> 60-bit immediate Floats
> >> Representation for immediate doubles, only used in the 64-bit implementation. Immediate doubles have the same 52 bit mantissa as IEEE double-precision  floating-point, but only have 8 bits of exponent.  So they occupy just less than the middle 1/8th of the double range.  They overlap the normal single-precision floats which also have 8 bit exponents, but exclude the single-precision denormals (exponent-127) and the single-precsion NaNs (exponent +127).  +/- zero is just a pair of values with both exponent and mantissa 0.
> >> So the non-zero immediate doubles range from
> >>         +/- 0x3800,0000,0000,0001 / 5.8774717541114d-39
> >> to      +/- 0x47ff,ffff,ffff,ffff / 6.8056473384188d+38
> >> The encoded tagged form has the sign bit moved to the least significant bit, which allows for faster encode/decode because offsetting the exponent can't overflow into the sign bit and because testing for +/- 0 is an unsigned compare for <= 0xf:
> >>     msb                                                                                        lsb
> >>     [8 exponent subset bits][52 mantissa bits ][1 sign bit][3 tag bits]
> >> So assuming the tag is 5, the tagged non-zero bit patterns are
> >>              0x0000,0000,0000,001[d/5]
> >> to           0xffff,ffff,ffff,fff[d/5]
> >> and +/- 0d is 0x0000,0000,0000,000[d/5]
> >> Encode/decode of non-zero values in machine code looks like:
> >>                                              msb                                              lsb
> >> Decode:                              [8expsubset][52mantissa][1s][3tags]
> >> shift away tags:                     [ 000 ][8expsubset][52mantissa][1s]
> >> add exponent offset: [     11 exponent     ][52mantissa][1s]
> >> rot sign:                            [1s][     11 exponent     ][52mantissa]
> >>
> >> Encode:                                      [1s][     11 exponent     ][52mantissa]
> >> rot sign:                            [     11 exponent     ][52mantissa][1s]
> >> sub exponent offset: [ 000 ][8expsubset][52 mantissa][1s]
> >> shift:                                       [8expsubset][52 mantissa][1s][ 000 ]
> >> or/add tags:                 [8expsubset][52mantissa][1s][3tags]
> >> but is slower in C because
> >> a) there is no rotate, and
> >> b) raw conversion between double and quadword must (at least in the source) move bits through memory ( quadword = *(q64 *)&doubleVariable).
> >>
> >> Issues:
> >> How do we avoid the Size4Bit for 64-bits?  The format word encodes the number of odd bytes, but currently has only 4 bits and hence only supports odd bytes of 0 - 3.  We need odd bytes of 0 - 7.  But I don't like the separate Size4Bit.  Best to change the VI code and have a 5 bit format?  We loose one bit but save two bits (isEphemeron and isWeak (or three, if isPointers)) for a net gain of one (or two)
> >>
> >> Further, keep Squeak's format idea or go for separate bits?  For 64-bits we need a 5 bit format field.  This contrasts with isPointers, isWeak, isEphemeron, 3 odd size bits (or byte size)..  format field is quite economical.
> >>
> >> Are class indices in inline caches strong references to classes or weak references?
> >> If strong then they must be scanned during GC and the methodZone must be flushed on fullGC to reclaim all classes (this looks to be a bug in the V3 Cogit).
> >> If weak then when the class table loses references, PICs containing freed classes must be freed and then sends to freed PICs or containing freed clases must be unlinked.
> >> The second approach is faster; the common case is scanning the class table, the uncommon case is freeing classes.  The second approach is better; in-line caches do not prevent reclamation of classes.
> >>
> >>
> >> Stef
> >>
> >>
> >>
> >> --
> >> best,
> >> Eliot
> >>
> >
>
>
>
>
>
> --
> best,
> Eliot
>

--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Eliot Miranda-2
 


On Wed, May 30, 2012 at 1:57 PM, Stefan Marr <[hidden email]> wrote:

Hi Eliot:

>From my experience with the RoarVM, it seems to be a rather simple exercise to enable the VM to support a custom 'pre-header' for objects.
That is, a constant offset in the memory that comes from the allocator, and is normally ignored by the GC.

That's a great idea.  So for experimental use one simply throws a whole word at the problem and forgets about the issue.  Thanks, Stefan.  That leaves me free to focus on something fast and compact that is still flexible.  Great!
 

That allows me to do all kind of stuff. Of course at a cost of a word per object, and at the cost of recompiling the VM.
But, that should be a reasonable price to pay for someone doing research on these kind of things.

Sometimes a few bits are just not enough, and such a pre-header gives much much more flexibility.
For the people interested in that, I could dig out the details (I think, I did that already once on this list).

Best regards
Stefan


On 30 May 2012, at 22:22, Eliot Miranda wrote:

>
>
> On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <[hidden email]> wrote:
> I would like to be sure that we can have
>        - bit for immutable objects
>        - bits for experimenting.
>
> There will be quite a few.  And one will be able to steal bits from the class field if one needs fewer classes.  I'm not absolutely sure of the layout yet.  But for example
>
> 8: slot size (255 => extra header word with large size)
> 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for pointer, 7 => # fixed fields is in the class)
> 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable, compiled method, ephemerons, weak, etc)
> 1: immutability
> 3: GC 2 mark bits. 1 forwarded bit
> 20: identity hash
> 20: class index
>
> still leaves 5 bits unused.  And stealing 4 bits each from class index still leaves 64k classes.  So this format is simple and provides lots of unused bits.  The format field is a great idea as it combines a number of orthogonal properties in very few bits.  I don't want to include odd bytes in format because I think a separate field that holds odd bytes and fixed fields is better use of space.  But we can gather statistics before deciding.
>
>
> Stef
>
> On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:
>
> > Hi guys
> >
> > Here is an important topic I would like to see discussed so that we see how we can improve and join forces.
> > May a mail discussion then a wiki for the summary would be good.
> >
> >
> > stef
> >
> >
> >
> > Begin forwarded message:
> >
> >> From: Eliot Miranda <[hidden email]>
> >> Subject: Re: Plan/discussion/communication around new object format
> >> Date: May 27, 2012 10:49:54 PM GMT+02:00
> >> To: Stéphane Ducasse <[hidden email]>
> >>
> >>
> >>
> >> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <[hidden email]> wrote:
> >> Hi eliot
> >>
> >> do you have a description of the new object format you want to introduce?
> >>
> >> The design is in the class comment of CogMemoryManager in the Cog VMMaker packages.
> >>
> >> Then what is your schedule?
> >>
> >> This is difficult. I have made a small start and should be able to spend time on it starting soon.  I want to have it finished by early next year.  But it depends on work schedules etc.
> >>
> >>
> >> I would like to see if we can allocate igor/esteban time before we run out of money
> >> to help on that important topic.
> >> Now the solution is unclear and I did not see any document where we can evaluate
> >> and plan how we can help. So do you want help on that topic? Then how can people
> >> contribute if any?
> >>
> >> The first thing to do is to read the design document, to see if the Pharo community thinks it is the right direction, and to review it, spot deficiencies etc.  So please get those interested to read the class comment of CogMemoryManager in the latest VMMaker.oscog.
> >>
> >> Here's the current version of it:
> >>
> >> CogMemoryManager is currently a place-holder for the design of the new Cog VM's object representation and garbage collector.  The goals for the GC are
> >>
> >> - efficient object representation a la Eliot Miranda's VisualWorks 64-bit object representation that uses a 64-bit header, eliminating direct class references so that all objects refer to their classes indirectly.  Instead the header contains a constant class index, in a field smaller than a full pointer, These class indices are used in inline and first-level method caches, hence they do not have to be updated on GC (although they do have to be traced to be able to GC classes).  Classes are held in a sparse weak table.  The class table needs only to be indexed by an instance's class index in class hierarchy search, in the class primitive, and in tracing live objects in the heap.  The additional header space is allocated to a much expanded identity hash field, reducing hash efficiency problems in identity collections due to the extremely small (11 bit) hash field in the old Squeak GC.  The identity hash field is also a key element of the class index scheme.  A class's identity hash is its index into the class table, so to create an instance of a class one merely copies its identity hash into the class index field of the new instance.  This implies that when classes gain their identity hash they are entered into the class table and their identity hash is that of a previously unused index in the table.  It also implies that there is a maximum number of classes in the table.  At least for a few years 64k classes should be enough.  A class is entered into the class table in the following operations:
> >>      behaviorHash
> >>      adoptInstance
> >>      instantiate
> >>      become  (i.e. if an old class becomes a new class)
> >>              if target class field's = to original's id hash
> >>                 and replacement's id hash is zero
> >>                      enter replacement in class table
> >> behaviorHash is a special version of identityHash that must be implemented in the image by any object that can function as a class (i.e. Behavior).
> >>
> >> - more immediate classes.  An immediate Character class would speed up String accessing, especially for WideString, since no instatiation needs to be done on at:put: and no dereference need be done on at:.  In a 32-bit system tag checking is complex since it is thought important to retain 31-bit SmallIntegers.  Hence, as in current Squeak, the least significant bit set implies a SmallInteger, but Characters would likely have a tag pattern of xxx10.  Hence masking with 11 results in two values for SmallInteger, xxx01 and xxx11.  30-bit characters are more than adequate for Unicode.  In a 64-bit system we can use the full three bits and usefully implement an immediate Float.  As in VisualWorks a functional representation takes three bits away from the exponent.  Rotating to put the sign bit in the least significant non-tag bit makes expanding and contracting the 8-bit exponent to the 11-bit IEEE double exponent easy ad makes comparing negative and positive zero easier (an immediate Float is zero if its unsigned 64-bits are < 16).  So the representation looks like
> >>      | 8 bit exponent | 52 bit mantissa | sign bit | 3 tag bits |
> >> For details see "60-bit immediate Floats" below.
> >>
> >>
> >> - efficient scavenging.  The current Squeak GC uses a slow pointer-reversal collector that writes every field in live objects three times in each collection, twice in the pointer-reversing heap traversal to mark live objects and once to update the pointer to its new location.  A scavenger writes every field of live data twice in each collection, once as it does a block copy of the object when copying to to space, once as it traverses the live pointers in the to space objects.  Of course the block copy is a relatively cheap write.
> >>
> >> - lazy become.  The JIT's use of inline cacheing provides a cheap way of avoiding scanning the heap as part of a become (which is the simple approach to implementing become in a system with direct pointers).  A becomeForward: on a (set of) non-zero-sized object(s) turns the object into a "corpse" or "forwarding object" whose first (non-header) word/slot is replaced by a pointer to the target of the becomeForward:.  The corpse's class index is set to one that identifies corpses and, because it is a hidden class index, will always fail an inline cache test.  The inline cache failure code is then responsible for following the forwarding pointer chain (these are Iliffe vectors :) ) and resolving to the actual target.  We have yet to determine exactly how this is done (e.g. change the receiver register and/or stack contents and retry the send, perhaps scanning the current activation).  See below on how we deal with becomes on objects with named inst vars.  Note that we probably don't have to worry about zero-sized objects.  These are unlikely to be passed through the FFI (there is nothing to pass :) ) and so will rarely be becommed.  If they do, they can become slowly.  Alternatively we can insist that objects are at least 16 bytes in size (see a8-byte alignment below) so that there will always be space for a forwarding pointer.  Since none of the immediate classes can have non-immediate instances and since we allocate the immediate classes indices corresponding to their tag pattern (SmallInteger = 1, Character = 3, SmallFloat = 4?) we can use all the class indices from 0 to 7 for special uses, 0 = forward, and e.g. 1 = header-sized filler.
> >>
> >> - pinning.  To support a robust and easy-to-use FFI the memory manager must support temporary pinning where individual objects can be prevented from being moved by the GC for as long as required, either by being one of an in-progress FFI call's arguments, or by having pinning asserted by a primitive (allowing objects to be passed to external code that retains a reference to the object after returning).  Pinning probably implies a per-object "is-pinned" bit in the object header.  Pinning will be done via lazy become; i..e an object in new space will be becommed into a pinned object in old space.  We will only support pinning in old space.
> >>
> >> - efficient old space collection.  An incremental collector (a la Dijkstra's three colour algorithm) collects old space, e.g. via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  (see free space/free list below)
> >>
> >> - 8-byte alignment.  It is advantageous for the FFI, for floating-point access, for object movement and for 32/64-bit compatibility to keep object sizes in units of 8 bytes.  For the FFI, 8-byte alignment means passing objects to code that expects that requirement (such as modern x86 numeric processing instructions).  This implies that
> >>      - the starts of all spaces are aligned on 8-byte boundaries
> >>      - object allocation rounds up the requested size to a multiple of 8 bytes
> >>      - the overflow size field is also 8 bytes
> >> We shall probably keep the minimum object size at 16 bytes so that there is always room for a forwarding pointer.  But this implies that we will need to implement an 8-byte filler to fill holes between objects > 16 bytes whose length mod 16 bytes is 8 bytes and following pinned objects.  We can do this using a special class index, e.g. 1, so that the method that answers the size of an object looks like, e.g.
> >>      chunkSizeOf: oop
> >>              <var: #oop type: #'object *'>
> >>              ^object classIndex = 1
> >>                      ifTrue: [BaseHeaderSize]
> >>                      ifFalse: [BaseHeaderSize
> >>                                + (object slotSize = OverflowSlotSize
> >>                                              ifTrue: [OverflowSizeBytes]
> >>                                              ifFalse: [0])
> >>                                + (object slotSize * BytesPerSlot)]
> >>
> >>      chunkStartOf: oop
> >>              <var: #oop type: #'object *'>
> >>              ^(self cCoerceSimple: oop to: #'char *')
> >>                      - ((object classIndex = 1
> >>                          or: [object slotSize ~= OverflowSlotSize])
> >>                                      ifTrue: [0]
> >>                                      ifFalse: [OverflowSizeBytes])
> >>
> >> For the moment we do not tackle the issue of heap growth and shrinkage with the ability to allocate and deallocate heap segments via memory-mapping.  This technique allows space to be released back to the OS by unmapping empty segments.  We may revisit this but it is not a key requirement for the first implementation.
> >>
> >> The basic approach is to use a fixed size new space and a growable old space.  The new space is a classic three-space nursery a la Ungar's Generation Scavenging, a large eden for new objects and two smaller survivor spaces that exchange roles on each collection, one being the to space to which surviving objects are copied, the other being the from space of the survivors of the previous collection, i.e. the previous to space.
> >>
> >> To provide apparent pinning in new space we rely on lazy become.  Since most pinned objects will be byte data and these do not require stack zone activation scanning, the overhead is simply an old space allocation and corpsing.
> >>
> >> To provide pinning in old space, large objects are implicitly pinned (because it is expensive to move large objects and, because they are both large and relatively rare, they contribute little to overall fragmentation - as in aggregates, small objects can be used to fill-in the spaces between karge objects).  Hence, objects above a particular size are automatically allocated in old space, rather than new space.  Small objects are pinned as per objects in new space, by asserting the pin bit, which will be set automaticaly when allocating a large object.  As a last resort, or by programmer control (the fullGC primitive) old space is collected via mark-sweep (mark-compact) and so the mark phase must build the list of pinned objects around which the sweep/compact phase must carefully step.
> >>
> >> Free space in old space is organized by a free list/free tree as in Eliot's VisualWorks 5i old space allocator.  There are 64 free lists, indices 1 through 63 holding blocks of space of that size, index 0 holding a semi-balanced ordered tree of free blocks, each node being the head of the list of free blocks of that size.  At the start of the mark phase the free list is thrown away and the sweep phase coallesces free space and steps over pinned objects as it proceeds.  We can reuse the forwarding pointer compaction scheme used in the old collector.  Incremental collections merely move unmarked objects to the free lists (as well as nilling weak pointers in weak arrays and scheduling them for finalization).  The occupancy of the free lists is represented by a bitmap in a 64-bit integer so that an allocation of size 63 or less can know whether there exists a free chunk of that size, but more importantly can know whether a free chunk larger than it exists in the fixed size free lists without having to search all larger free list heads.
> >>
> >> The incremental collector (a la Dijkstra's three colour algorithm) collects old space via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  [N.B. Not sure how to do this yet.  The incremental collector needs to complete a pass often enough to reclaim objects, but infrequent enough not to waste time.  So some form of feedback should work.  In VisualWorks tracing is broken into quanta or work where image-level code determines the size of a quantum based on how fast the machine is, and how big the heap is.  This code could easily live in the VM, controllable through vmParameterAt:put:.  An alternative would be to use the heartbeat to bound quanta by time.  But in any case some amount of incremental collection would be done on old space allocation and scavenging, the ammount being chosen to keep pause times acceptably short, and at a rate to reclaim old space before a full GC is required, i.e. at a rate proportional to the growth in old space]. The incemental collector is a state machine, being either marking, nilling weak pointers, or freeing.  If nilling weak pointers is not done atomically then there must be a read barrier in weak array at: so that reading from an old space weak array that is holding stale un-nilled references to unmarked objects.  Tricks such as including the weak bit in bounds calculations can make this cheap for non-weak arrays.  Alternatively nilling weak pointers can be made an atomic part of incremental collection, which can be made cheaper by maintaining the set of weak arrays (e.g. on a list).
> >>
> >> The incremental collector implies a more complex write barrier.  Objects are of three colours, black, having been scanned, grey, being scanned, and white, unreached.  A mark stack holds the grey objects.   If the incremental collector is marking and an unmarked white object is stored into a black object then the stored object must become grey, being added to the mark stack.  So the wrte barrier is essentially
> >>      target isYoung ifFalse:
> >>              [newValue isYoung
> >>                      ifTrue: [target isInRememberedSet ifFalse:
> >>                                      [target addToRememberedSet]] "target now refers to a young object; it is a root for scavenges"
> >>                      ifFalse:
> >>                              [(target isBlack
> >>                                and: [igc marking
> >>                                and: [newValue isWhite]]) ifTrue:
> >>                                      [newValue beGrey]]] "add newValue to IGC's markStack for subsequent scanning"
> >>
> >> The incremental collector does not detect already marked objects all of whose references have been overwritten by other stores (e.g. in the above if newValue overwrites the sole remaining reference to a marked object).  So the incremental collector only guarantees to collect all garbage created in cycle N at the end of cycle N + 1.  The cost is hence slightly worse memory density but the benefit, provided the IGC works hard enough, is the elimination of long pauses due to full garbage collections, which become actions of last resort or programmer desire.
> >>
> >> Lazy become.
> >>
> >> As described earlier the basic idea behind lazy become is to use corpses (forwarding objects) that are followed lazily during GC and inline cache miss.  However, a lazy scheme cannot be used on objects with named inst vars without adding checking to all inst var accesses, which we judge too expensive.  Instead, when becomming objects with named inst vars, we scan all activations in the stack zone, eagerly becomming these references, and we check for corpses when faulting in a context into the stack zone.  Essentially, the invariant is that there are no references to corpses from the receiver slots of stack activations.  A detail is whether we allow or forbid pinning of closure indirection vectors, or scan the entire stack of each activation.  Using a special class index pun for indirection vectors is a cheap way of preventing their becomming/pinning etc.  Although "don't do that" (don't attempt to pin/become indirection vectors) is also an acceptable response.
> >>
> >> 60-bit immediate Floats
> >> Representation for immediate doubles, only used in the 64-bit implementation. Immediate doubles have the same 52 bit mantissa as IEEE double-precision  floating-point, but only have 8 bits of exponent.  So they occupy just less than the middle 1/8th of the double range.  They overlap the normal single-precision floats which also have 8 bit exponents, but exclude the single-precision denormals (exponent-127) and the single-precsion NaNs (exponent +127).  +/- zero is just a pair of values with both exponent and mantissa 0.
> >> So the non-zero immediate doubles range from
> >>         +/- 0x3800,0000,0000,0001 / 5.8774717541114d-39
> >> to      +/- 0x47ff,ffff,ffff,ffff / 6.8056473384188d+38
> >> The encoded tagged form has the sign bit moved to the least significant bit, which allows for faster encode/decode because offsetting the exponent can't overflow into the sign bit and because testing for +/- 0 is an unsigned compare for <= 0xf:
> >>     msb                                                                                        lsb
> >>     [8 exponent subset bits][52 mantissa bits ][1 sign bit][3 tag bits]
> >> So assuming the tag is 5, the tagged non-zero bit patterns are
> >>              0x0000,0000,0000,001[d/5]
> >> to           0xffff,ffff,ffff,fff[d/5]
> >> and +/- 0d is 0x0000,0000,0000,000[d/5]
> >> Encode/decode of non-zero values in machine code looks like:
> >>                                              msb                                              lsb
> >> Decode:                              [8expsubset][52mantissa][1s][3tags]
> >> shift away tags:                     [ 000 ][8expsubset][52mantissa][1s]
> >> add exponent offset: [     11 exponent     ][52mantissa][1s]
> >> rot sign:                            [1s][     11 exponent     ][52mantissa]
> >>
> >> Encode:                                      [1s][     11 exponent     ][52mantissa]
> >> rot sign:                            [     11 exponent     ][52mantissa][1s]
> >> sub exponent offset: [ 000 ][8expsubset][52 mantissa][1s]
> >> shift:                                       [8expsubset][52 mantissa][1s][ 000 ]
> >> or/add tags:                 [8expsubset][52mantissa][1s][3tags]
> >> but is slower in C because
> >> a) there is no rotate, and
> >> b) raw conversion between double and quadword must (at least in the source) move bits through memory ( quadword = *(q64 *)&doubleVariable).
> >>
> >> Issues:
> >> How do we avoid the Size4Bit for 64-bits?  The format word encodes the number of odd bytes, but currently has only 4 bits and hence only supports odd bytes of 0 - 3.  We need odd bytes of 0 - 7.  But I don't like the separate Size4Bit.  Best to change the VI code and have a 5 bit format?  We loose one bit but save two bits (isEphemeron and isWeak (or three, if isPointers)) for a net gain of one (or two)
> >>
> >> Further, keep Squeak's format idea or go for separate bits?  For 64-bits we need a 5 bit format field.  This contrasts with isPointers, isWeak, isEphemeron, 3 odd size bits (or byte size)..  format field is quite economical.
> >>
> >> Are class indices in inline caches strong references to classes or weak references?
> >> If strong then they must be scanned during GC and the methodZone must be flushed on fullGC to reclaim all classes (this looks to be a bug in the V3 Cogit).
> >> If weak then when the class table loses references, PICs containing freed classes must be freed and then sends to freed PICs or containing freed clases must be unlinked.
> >> The second approach is faster; the common case is scanning the class table, the uncommon case is freeing classes.  The second approach is better; in-line caches do not prevent reclamation of classes.
> >>
> >>
> >> Stef
> >>
> >>
> >>
> >> --
> >> best,
> >> Eliot
> >>
> >
>
>
>
>
>
> --
> best,
> Eliot
>

--
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: <a href="tel:%2B32%202%20629%202974" value="+3226292974">+32 2 629 2974
Fax:   <a href="tel:%2B32%202%20629%203525" value="+3226293525">+32 2 629 3525




--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Eliot Miranda-2
In reply to this post by Mariano Martinez Peck
 


On Wed, May 30, 2012 at 1:53 PM, Mariano Martinez Peck <[hidden email]> wrote:
 


On Wed, May 30, 2012 at 10:22 PM, Eliot Miranda <[hidden email]> wrote:
 


On Wed, May 30, 2012 at 12:59 PM, Stéphane Ducasse <[hidden email]> wrote:
I would like to be sure that we can have
       - bit for immutable objects
       - bits for experimenting.

There will be quite a few.  And one will be able to steal bits from the class field if one needs fewer classes.  I'm not absolutely sure of the layout yet.  But for example

8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash

and we can make it lazy, that is, we compute it not at instantiation time but rather the first time the primitive is call.

 
20: class index

This would probably work for a while. I think that it would be good to let an "open door" so that in the future we can just add one more word for a class pointer.

Turns out that's not such a simple change.  Class indices have two advantages.  One is that they're more compact (2^20 classes is still a lot of classes).  The other is that they're constant, which has two main benefits.  First, in method caches and in-line caches the class field holds an index and hence doesn't need to be updated by the GC.  The GC no longer has top visit send sites.  Second, because they're constants both checking for well-known classes and instantiating well-known classes can be done without going to the specialObjectsArray. One just uses the constant.  Now undoing these optimizations to open a back-dorr is not trivial.  So best accept the benefits and exploit them to a maximum.

 

still leaves 5 bits unused.  And stealing 4 bits each from class index still leaves 64k classes.  So this format is simple and provides lots of unused bits.  The format field is a great idea as it combines a number of orthogonal properties in very few bits.  I don't want to include odd bytes in format because I think a separate field that holds odd bytes and fixed fields is better use of space.  But we can gather statistics before deciding.


Stef

On May 30, 2012, at 8:48 AM, Stéphane Ducasse wrote:

> Hi guys
>
> Here is an important topic I would like to see discussed so that we see how we can improve and join forces.
> May a mail discussion then a wiki for the summary would be good.
>
>
> stef
>
>
>
> Begin forwarded message:
>
>> From: Eliot Miranda <[hidden email]>
>> Subject: Re: Plan/discussion/communication around new object format
>> Date: May 27, 2012 10:49:54 PM GMT+02:00
>> To: Stéphane Ducasse <[hidden email]>
>>
>>
>>
>> On Sat, May 26, 2012 at 1:46 AM, Stéphane Ducasse <[hidden email]> wrote:
>> Hi eliot
>>
>> do you have a description of the new object format you want to introduce?
>>
>> The design is in the class comment of CogMemoryManager in the Cog VMMaker packages.
>>
>> Then what is your schedule?
>>
>> This is difficult. I have made a small start and should be able to spend time on it starting soon.  I want to have it finished by early next year.  But it depends on work schedules etc.
>>
>>
>> I would like to see if we can allocate igor/esteban time before we run out of money
>> to help on that important topic.
>> Now the solution is unclear and I did not see any document where we can evaluate
>> and plan how we can help. So do you want help on that topic? Then how can people
>> contribute if any?
>>
>> The first thing to do is to read the design document, to see if the Pharo community thinks it is the right direction, and to review it, spot deficiencies etc.  So please get those interested to read the class comment of CogMemoryManager in the latest VMMaker.oscog.
>>
>> Here's the current version of it:
>>
>> CogMemoryManager is currently a place-holder for the design of the new Cog VM's object representation and garbage collector.  The goals for the GC are
>>
>> - efficient object representation a la Eliot Miranda's VisualWorks 64-bit object representation that uses a 64-bit header, eliminating direct class references so that all objects refer to their classes indirectly.  Instead the header contains a constant class index, in a field smaller than a full pointer, These class indices are used in inline and first-level method caches, hence they do not have to be updated on GC (although they do have to be traced to be able to GC classes).  Classes are held in a sparse weak table.  The class table needs only to be indexed by an instance's class index in class hierarchy search, in the class primitive, and in tracing live objects in the heap.  The additional header space is allocated to a much expanded identity hash field, reducing hash efficiency problems in identity collections due to the extremely small (11 bit) hash field in the old Squeak GC.  The identity hash field is also a key element of the class index scheme.  A class's identity hash is its index into the class table, so to create an instance of a class one merely copies its identity hash into the class index field of the new instance.  This implies that when classes gain their identity hash they are entered into the class table and their identity hash is that of a previously unused index in the table.  It also implies that there is a maximum number of classes in the table.  At least for a few years 64k classes should be enough.  A class is entered into the class table in the following operations:
>>      behaviorHash
>>      adoptInstance
>>      instantiate
>>      become  (i.e. if an old class becomes a new class)
>>              if target class field's = to original's id hash
>>                 and replacement's id hash is zero
>>                      enter replacement in class table
>> behaviorHash is a special version of identityHash that must be implemented in the image by any object that can function as a class (i.e. Behavior).
>>
>> - more immediate classes.  An immediate Character class would speed up String accessing, especially for WideString, since no instatiation needs to be done on at:put: and no dereference need be done on at:.  In a 32-bit system tag checking is complex since it is thought important to retain 31-bit SmallIntegers.  Hence, as in current Squeak, the least significant bit set implies a SmallInteger, but Characters would likely have a tag pattern of xxx10.  Hence masking with 11 results in two values for SmallInteger, xxx01 and xxx11.  30-bit characters are more than adequate for Unicode.  In a 64-bit system we can use the full three bits and usefully implement an immediate Float.  As in VisualWorks a functional representation takes three bits away from the exponent.  Rotating to put the sign bit in the least significant non-tag bit makes expanding and contracting the 8-bit exponent to the 11-bit IEEE double exponent easy ad makes comparing negative and positive zero easier (an immediate Float is zero if its unsigned 64-bits are < 16).  So the representation looks like
>>      | 8 bit exponent | 52 bit mantissa | sign bit | 3 tag bits |
>> For details see "60-bit immediate Floats" below.
>>
>>
>> - efficient scavenging.  The current Squeak GC uses a slow pointer-reversal collector that writes every field in live objects three times in each collection, twice in the pointer-reversing heap traversal to mark live objects and once to update the pointer to its new location.  A scavenger writes every field of live data twice in each collection, once as it does a block copy of the object when copying to to space, once as it traverses the live pointers in the to space objects.  Of course the block copy is a relatively cheap write.
>>
>> - lazy become.  The JIT's use of inline cacheing provides a cheap way of avoiding scanning the heap as part of a become (which is the simple approach to implementing become in a system with direct pointers).  A becomeForward: on a (set of) non-zero-sized object(s) turns the object into a "corpse" or "forwarding object" whose first (non-header) word/slot is replaced by a pointer to the target of the becomeForward:.  The corpse's class index is set to one that identifies corpses and, because it is a hidden class index, will always fail an inline cache test.  The inline cache failure code is then responsible for following the forwarding pointer chain (these are Iliffe vectors :) ) and resolving to the actual target.  We have yet to determine exactly how this is done (e.g. change the receiver register and/or stack contents and retry the send, perhaps scanning the current activation).  See below on how we deal with becomes on objects with named inst vars.  Note that we probably don't have to worry about zero-sized objects.  These are unlikely to be passed through the FFI (there is nothing to pass :) ) and so will rarely be becommed.  If they do, they can become slowly.  Alternatively we can insist that objects are at least 16 bytes in size (see a8-byte alignment below) so that there will always be space for a forwarding pointer.  Since none of the immediate classes can have non-immediate instances and since we allocate the immediate classes indices corresponding to their tag pattern (SmallInteger = 1, Character = 3, SmallFloat = 4?) we can use all the class indices from 0 to 7 for special uses, 0 = forward, and e.g. 1 = header-sized filler.
>>
>> - pinning.  To support a robust and easy-to-use FFI the memory manager must support temporary pinning where individual objects can be prevented from being moved by the GC for as long as required, either by being one of an in-progress FFI call's arguments, or by having pinning asserted by a primitive (allowing objects to be passed to external code that retains a reference to the object after returning).  Pinning probably implies a per-object "is-pinned" bit in the object header.  Pinning will be done via lazy become; i..e an object in new space will be becommed into a pinned object in old space.  We will only support pinning in old space.
>>
>> - efficient old space collection.  An incremental collector (a la Dijkstra's three colour algorithm) collects old space, e.g. via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  (see free space/free list below)
>>
>> - 8-byte alignment.  It is advantageous for the FFI, for floating-point access, for object movement and for <a href="tel:32%2F64" value="+333264" target="_blank">32/64-bit compatibility to keep object sizes in units of 8 bytes.  For the FFI, 8-byte alignment means passing objects to code that expects that requirement (such as modern x86 numeric processing instructions).  This implies that
>>      - the starts of all spaces are aligned on 8-byte boundaries
>>      - object allocation rounds up the requested size to a multiple of 8 bytes
>>      - the overflow size field is also 8 bytes
>> We shall probably keep the minimum object size at 16 bytes so that there is always room for a forwarding pointer.  But this implies that we will need to implement an 8-byte filler to fill holes between objects > 16 bytes whose length mod 16 bytes is 8 bytes and following pinned objects.  We can do this using a special class index, e.g. 1, so that the method that answers the size of an object looks like, e.g.
>>      chunkSizeOf: oop
>>              <var: #oop type: #'object *'>
>>              ^object classIndex = 1
>>                      ifTrue: [BaseHeaderSize]
>>                      ifFalse: [BaseHeaderSize
>>                                + (object slotSize = OverflowSlotSize
>>                                              ifTrue: [OverflowSizeBytes]
>>                                              ifFalse: [0])
>>                                + (object slotSize * BytesPerSlot)]
>>
>>      chunkStartOf: oop
>>              <var: #oop type: #'object *'>
>>              ^(self cCoerceSimple: oop to: #'char *')
>>                      - ((object classIndex = 1
>>                          or: [object slotSize ~= OverflowSlotSize])
>>                                      ifTrue: [0]
>>                                      ifFalse: [OverflowSizeBytes])
>>
>> For the moment we do not tackle the issue of heap growth and shrinkage with the ability to allocate and deallocate heap segments via memory-mapping.  This technique allows space to be released back to the OS by unmapping empty segments.  We may revisit this but it is not a key requirement for the first implementation.
>>
>> The basic approach is to use a fixed size new space and a growable old space.  The new space is a classic three-space nursery a la Ungar's Generation Scavenging, a large eden for new objects and two smaller survivor spaces that exchange roles on each collection, one being the to space to which surviving objects are copied, the other being the from space of the survivors of the previous collection, i.e. the previous to space.
>>
>> To provide apparent pinning in new space we rely on lazy become.  Since most pinned objects will be byte data and these do not require stack zone activation scanning, the overhead is simply an old space allocation and corpsing.
>>
>> To provide pinning in old space, large objects are implicitly pinned (because it is expensive to move large objects and, because they are both large and relatively rare, they contribute little to overall fragmentation - as in aggregates, small objects can be used to fill-in the spaces between karge objects).  Hence, objects above a particular size are automatically allocated in old space, rather than new space.  Small objects are pinned as per objects in new space, by asserting the pin bit, which will be set automaticaly when allocating a large object.  As a last resort, or by programmer control (the fullGC primitive) old space is collected via mark-sweep (mark-compact) and so the mark phase must build the list of pinned objects around which the sweep/compact phase must carefully step.
>>
>> Free space in old space is organized by a free list/free tree as in Eliot's VisualWorks 5i old space allocator.  There are 64 free lists, indices 1 through 63 holding blocks of space of that size, index 0 holding a semi-balanced ordered tree of free blocks, each node being the head of the list of free blocks of that size.  At the start of the mark phase the free list is thrown away and the sweep phase coallesces free space and steps over pinned objects as it proceeds.  We can reuse the forwarding pointer compaction scheme used in the old collector.  Incremental collections merely move unmarked objects to the free lists (as well as nilling weak pointers in weak arrays and scheduling them for finalization).  The occupancy of the free lists is represented by a bitmap in a 64-bit integer so that an allocation of size 63 or less can know whether there exists a free chunk of that size, but more importantly can know whether a free chunk larger than it exists in the fixed size free lists without having to search all larger free list heads.
>>
>> The incremental collector (a la Dijkstra's three colour algorithm) collects old space via an amount of tracing being hung off scavenges and/or old space allocations at an adaptive rate that keeps full garbage collections to a minimum.  [N.B. Not sure how to do this yet.  The incremental collector needs to complete a pass often enough to reclaim objects, but infrequent enough not to waste time.  So some form of feedback should work.  In VisualWorks tracing is broken into quanta or work where image-level code determines the size of a quantum based on how fast the machine is, and how big the heap is.  This code could easily live in the VM, controllable through vmParameterAt:put:.  An alternative would be to use the heartbeat to bound quanta by time.  But in any case some amount of incremental collection would be done on old space allocation and scavenging, the ammount being chosen to keep pause times acceptably short, and at a rate to reclaim old space before a full GC is required, i.e. at a rate proportional to the growth in old space]. The incemental collector is a state machine, being either marking, nilling weak pointers, or freeing.  If nilling weak pointers is not done atomically then there must be a read barrier in weak array at: so that reading from an old space weak array that is holding stale un-nilled references to unmarked objects.  Tricks such as including the weak bit in bounds calculations can make this cheap for non-weak arrays.  Alternatively nilling weak pointers can be made an atomic part of incremental collection, which can be made cheaper by maintaining the set of weak arrays (e.g. on a list).
>>
>> The incremental collector implies a more complex write barrier.  Objects are of three colours, black, having been scanned, grey, being scanned, and white, unreached.  A mark stack holds the grey objects.   If the incremental collector is marking and an unmarked white object is stored into a black object then the stored object must become grey, being added to the mark stack.  So the wrte barrier is essentially
>>      target isYoung ifFalse:
>>              [newValue isYoung
>>                      ifTrue: [target isInRememberedSet ifFalse:
>>                                      [target addToRememberedSet]] "target now refers to a young object; it is a root for scavenges"
>>                      ifFalse:
>>                              [(target isBlack
>>                                and: [igc marking
>>                                and: [newValue isWhite]]) ifTrue:
>>                                      [newValue beGrey]]] "add newValue to IGC's markStack for subsequent scanning"
>>
>> The incremental collector does not detect already marked objects all of whose references have been overwritten by other stores (e.g. in the above if newValue overwrites the sole remaining reference to a marked object).  So the incremental collector only guarantees to collect all garbage created in cycle N at the end of cycle N + 1.  The cost is hence slightly worse memory density but the benefit, provided the IGC works hard enough, is the elimination of long pauses due to full garbage collections, which become actions of last resort or programmer desire.
>>
>> Lazy become.
>>
>> As described earlier the basic idea behind lazy become is to use corpses (forwarding objects) that are followed lazily during GC and inline cache miss.  However, a lazy scheme cannot be used on objects with named inst vars without adding checking to all inst var accesses, which we judge too expensive.  Instead, when becomming objects with named inst vars, we scan all activations in the stack zone, eagerly becomming these references, and we check for corpses when faulting in a context into the stack zone.  Essentially, the invariant is that there are no references to corpses from the receiver slots of stack activations.  A detail is whether we allow or forbid pinning of closure indirection vectors, or scan the entire stack of each activation.  Using a special class index pun for indirection vectors is a cheap way of preventing their becomming/pinning etc.  Although "don't do that" (don't attempt to pin/become indirection vectors) is also an acceptable response.
>>
>> 60-bit immediate Floats
>> Representation for immediate doubles, only used in the 64-bit implementation. Immediate doubles have the same 52 bit mantissa as IEEE double-precision  floating-point, but only have 8 bits of exponent.  So they occupy just less than the middle 1/8th of the double range.  They overlap the normal single-precision floats which also have 8 bit exponents, but exclude the single-precision denormals (exponent-127) and the single-precsion NaNs (exponent +127).  +/- zero is just a pair of values with both exponent and mantissa 0.
>> So the non-zero immediate doubles range from
>>         +/- 0x3800,0000,0000,0001 / 5.8774717541114d-39
>> to      +/- 0x47ff,ffff,ffff,ffff / 6.8056473384188d+38
>> The encoded tagged form has the sign bit moved to the least significant bit, which allows for faster encode/decode because offsetting the exponent can't overflow into the sign bit and because testing for +/- 0 is an unsigned compare for <= 0xf:
>>     msb                                                                                        lsb
>>     [8 exponent subset bits][52 mantissa bits ][1 sign bit][3 tag bits]
>> So assuming the tag is 5, the tagged non-zero bit patterns are
>>              0x0000,0000,0000,001[d/5]
>> to           0xffff,ffff,ffff,fff[d/5]
>> and +/- 0d is 0x0000,0000,0000,000[d/5]
>> Encode/decode of non-zero values in machine code looks like:
>>                                              msb                                              lsb
>> Decode:                              [8expsubset][52mantissa][1s][3tags]
>> shift away tags:                     [ 000 ][8expsubset][52mantissa][1s]
>> add exponent offset: [     11 exponent     ][52mantissa][1s]
>> rot sign:                            [1s][     11 exponent     ][52mantissa]
>>
>> Encode:                                      [1s][     11 exponent     ][52mantissa]
>> rot sign:                            [     11 exponent     ][52mantissa][1s]
>> sub exponent offset: [ 000 ][8expsubset][52 mantissa][1s]
>> shift:                                       [8expsubset][52 mantissa][1s][ 000 ]
>> or/add tags:                 [8expsubset][52mantissa][1s][3tags]
>> but is slower in C because
>> a) there is no rotate, and
>> b) raw conversion between double and quadword must (at least in the source) move bits through memory ( quadword = *(q64 *)&doubleVariable).
>>
>> Issues:
>> How do we avoid the Size4Bit for 64-bits?  The format word encodes the number of odd bytes, but currently has only 4 bits and hence only supports odd bytes of 0 - 3.  We need odd bytes of 0 - 7.  But I don't like the separate Size4Bit.  Best to change the VI code and have a 5 bit format?  We loose one bit but save two bits (isEphemeron and isWeak (or three, if isPointers)) for a net gain of one (or two)
>>
>> Further, keep Squeak's format idea or go for separate bits?  For 64-bits we need a 5 bit format field.  This contrasts with isPointers, isWeak, isEphemeron, 3 odd size bits (or byte size)..  format field is quite economical.
>>
>> Are class indices in inline caches strong references to classes or weak references?
>> If strong then they must be scanned during GC and the methodZone must be flushed on fullGC to reclaim all classes (this looks to be a bug in the V3 Cogit).
>> If weak then when the class table loses references, PICs containing freed classes must be freed and then sends to freed PICs or containing freed clases must be unlinked.
>> The second approach is faster; the common case is scanning the class table, the uncommon case is freeing classes.  The second approach is better; in-line caches do not prevent reclamation of classes.
>>
>>
>> Stef
>>
>>
>>
>> --
>> best,
>> Eliot
>>
>





--
best,
Eliot





--
Mariano
http://marianopeck.wordpress.com





--
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko
In reply to this post by Eliot Miranda-2
 
Here are couple (2) of mine, highly valuable cents :)

2^20 for classes?
might be fine (or even overkill) for smalltalk systems, but could be
quite limiting for one who would like experiment and implementing a
prototype-based frameworks,
where every object is a "class" by itself.

---
8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash
20: class index
---
what takes most of the space in object header? right - hash!
Now, since we will have lazy become i am back to my idea of having
extra & arbitrary properties
per object.

In a nutshell, the idea is to not store hash in an object header, but
instead use just a single bit 'hash present'.

When identity hash of object is requested (via corresponding primitive)
the implementation could check if 'hash present' is set,
then if it's not there , we do a 'lazy become' of existing object to same object
copied into another place, but with hash bit set, and with extra 64-bit field,
where hash value can be stored.

So, when you requesting an identity hash for object which don't have it,
the object of from:
[header][...data.. ]
copied to new memory region with new layout:
[header][hash bits][...data..]

and old object, is of course 'corpsed' to forwarding pointer to new location.

Next step is going from holding just hash to having an arbitrary &
dynamic number of extra fields per object.
In same way, we use 1 extra bit, indicating that object having extra properties.
And when object don't have it, we lazy-become it from being
[header][...data.. ]
or
[header][hash bits][...data..]
to:
[header][hash bits][oop][...data..]

where 'oop' can be anything - instance of Array/Dictionary (depends
how language-side will decide to store extra properties of object)

This , for instance , would allow us to store extra properties for
special object formats like variable bytes or compiled methods, which
don't have the instance variables.

Not need to mention, how useful being able to attach extra properties
per object, without changing the object's class.
And , of course the freed 18 bits (20 - 2) in header can be allocated
for other purposes.
(Stef, how many bits you need for experiments? ;)

------------

About immediates zoo.

Keep in mind, that the more immediates we have, the more complex implementation
tends to be.

I would just keep 2 data types:
 - integers
 - floats

and third, special 'arbitrary' immediate , which seen by VM as a 60-bit value.
The interpretation of this value depends on lookup in range-table,
where developer specifying the correspondence between the value
interval and class:
[min .. max] -> class

intervals, of course, cannot overlap.
Determining a class of such immediate might be slower - O(log2(n)) at
best (where n is size of range table), but from other side,
how many different kinds of immediates you can fit into 60-bit value?
Right, it is 2^60. Much more than proposed 8 isn't? :)

And this extra cost can be mitigated completely by inline cache.
- in case of regular reference, you must fetch the object's class and
then compare it with one, stored in cache.
- in case of immediate reference, you compare immediate value with min
and max stored in cache fields.
And if value is in range, you got a cache hit, and free to proceed.
So, its just 1 extra comparison comparing to 'classic' inline cache.

And, after thinking how inline cache is organized, now you can scratch
the first my paragraph related to  immediates!
We really don't need to discriminate between small integers/floats/rest!!
They could also be nothing more than just one of a range(s) defined in
our zoo of 'special' immediates!

So, at the end we will have just two kinds of references:
 - zero bit == 0 -- an object pointer
 - zero bit == 1 -- an immediate

Voila!.

We can have real zoo of immediates, and simple implementation to support them.
And not saying that range-table is provided by language-side, so we're
free to rearrange them at any moment.

And of course, it doesn't means that VM could not reserve some of the
ranges for own 'contracted'
immediates, like Characters, and even class reference for example.
Think about it :)
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Levente Uzonyi-2
 
On Wed, 30 May 2012, Igor Stasenko wrote:

>
> Here are couple (2) of mine, highly valuable cents :)
>
> 2^20 for classes?
> might be fine (or even overkill) for smalltalk systems, but could be
> quite limiting for one who would like experiment and implementing a
> prototype-based frameworks,
> where every object is a "class" by itself.

I think it's more important to have a fast Smalltalk VM, than one that is
slower, but might fit for a concrete experiment which might happen sometime
and would get some performance boost from the implementation.

>
> ---
> 8: slot size (255 => extra header word with large size)
> 3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
> pointer, 7 => # fixed fields is in the class)
> 4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
> indexable, compiled method, ephemerons, weak, etc)
> 1: immutability
> 3: GC 2 mark bits. 1 forwarded bit
> 20: identity hash
> 20: class index
> ---
> what takes most of the space in object header? right - hash!
> Now, since we will have lazy become i am back to my idea of having
> extra & arbitrary properties
> per object.
>
> In a nutshell, the idea is to not store hash in an object header, but
> instead use just a single bit 'hash present'.
>
> When identity hash of object is requested (via corresponding primitive)
> the implementation could check if 'hash present' is set,
> then if it's not there , we do a 'lazy become' of existing object to same object
> copied into another place, but with hash bit set, and with extra 64-bit field,
> where hash value can be stored.
>
> So, when you requesting an identity hash for object which don't have it,
> the object of from:
> [header][...data.. ]
> copied to new memory region with new layout:
> [header][hash bits][...data..]
>
> and old object, is of course 'corpsed' to forwarding pointer to new location.

The weak point of this idea is that you might run of out memory during
the allocation of the new object if you ask the identity hash of a larger
object or many smaller objects at once.

>
> Next step is going from holding just hash to having an arbitrary &
> dynamic number of extra fields per object.
> In same way, we use 1 extra bit, indicating that object having extra properties.
> And when object don't have it, we lazy-become it from being
> [header][...data.. ]
> or
> [header][hash bits][...data..]
> to:
> [header][hash bits][oop][...data..]
>
> where 'oop' can be anything - instance of Array/Dictionary (depends
> how language-side will decide to store extra properties of object)
>
> This , for instance , would allow us to store extra properties for
> special object formats like variable bytes or compiled methods, which
> don't have the instance variables.
>
> Not need to mention, how useful being able to attach extra properties
> per object, without changing the object's class.
> And , of course the freed 18 bits (20 - 2) in header can be allocated
> for other purposes.
> (Stef, how many bits you need for experiments? ;)
>
> ------------
>
> About immediates zoo.
>
> Keep in mind, that the more immediates we have, the more complex implementation
> tends to be.
>
> I would just keep 2 data types:
> - integers
> - floats
>
> and third, special 'arbitrary' immediate , which seen by VM as a 60-bit value.
> The interpretation of this value depends on lookup in range-table,
> where developer specifying the correspondence between the value
> interval and class:
> [min .. max] -> class
>
> intervals, of course, cannot overlap.
> Determining a class of such immediate might be slower - O(log2(n)) at
> best (where n is size of range table), but from other side,
> how many different kinds of immediates you can fit into 60-bit value?
> Right, it is 2^60. Much more than proposed 8 isn't? :)
>
> And this extra cost can be mitigated completely by inline cache.
> - in case of regular reference, you must fetch the object's class and
> then compare it with one, stored in cache.
> - in case of immediate reference, you compare immediate value with min
> and max stored in cache fields.
> And if value is in range, you got a cache hit, and free to proceed.
> So, its just 1 extra comparison comparing to 'classic' inline cache.
>
> And, after thinking how inline cache is organized, now you can scratch
> the first my paragraph related to  immediates!
> We really don't need to discriminate between small integers/floats/rest!!
> They could also be nothing more than just one of a range(s) defined in
> our zoo of 'special' immediates!
>
> So, at the end we will have just two kinds of references:
> - zero bit == 0 -- an object pointer
> - zero bit == 1 -- an immediate
>
> Voila!.
>
> We can have real zoo of immediates, and simple implementation to support them.
> And not saying that range-table is provided by language-side, so we're
> free to rearrange them at any moment.
>
> And of course, it doesn't means that VM could not reserve some of the
> ranges for own 'contracted'
> immediates, like Characters, and even class reference for example.
> Think about it :)
>

I like the idea, but I'm not sure how useful it will be in practice. I'd
also add characters as third data type. String/Character operations should
be as fast as possible.


Levente
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko
 
Some extra ideas.

1. Avoiding extra header for big sized objects.
I not sure about this, but still ..

according to Eliot's design:
8: slot size (255 => extra header word with large size)

What if we extend size to 16 bits (so in total it will be 65536 slots)
and we have a single flag, pointing how to calculate object size:

flag(0)   object size = (size field) * 8
flag(1)  object size = 2^ (slot field)

which means that past 2^16 (or how many bits we dedicate to size field
in header) all object sizes
will be power of two.
Since most of the objects will fit under 2^16, we don't lose much.
For big arrays, we could have a special collection/array, which will
store exact size in it's inst var (and we even don't need to care in
cases of Sets/Dicts/OrderedCollections).
Also we can actually make it transparent:

Array class>>new: size
  size > (max exact size ) ifTrue: [ ^ ArrayWithBigSizeWhatever new: size ]

of course, care must be taken for those variable classes which
potentially can hold large amounts of bytes (like Bitmap).
But i think code can be quickly adopted to this feature of VM, which
will simply fail a #new: primitive
if size is not power of two for sizes greater than max "exact size"
which can fit into size field of header.
----

2. Slot for arbitrary properties.
If you read carefully, Eliot said that for making lazy become it is
necessary to always have some extra space per object, even if object
don't have any fields:

<<We shall probably keep the minimum object size at 16 bytes so that
there is always room for a forwarding pointer. >>

So, this fits quite well with idea of having slot for dynamic
properties per object. What if instead of "extending object" when it
requires extra properties slot, we just reserve the slot for
properties at the very beginning:

[ header ]
[ properties slot]
... rest of data ..

so, any object will have that slot. And in case of lazy-become. we can
use that slot for holding forwarding pointer. Voila.

3. From 2. we going straight back to hash.. VM don't needs to know
such a thing as object's hash, it has no semantic load inside VM, it
just answers those bits by a single primitive.

So, why it is kind of enforced inherent property of all objects in
system? And why nobody asks, if we have that one, why we could not
have more than one or as many as we want? This is my central question
around idea of having per-object properties.
Once VM will guarantee that any object can have at least one slot for
storing object reference (property slot),
then it is no longer needed for VM to care about identity hash.

Because it can be implemented completely at language size. But most of
all, we are NO longer limited
how big/small hash values , which directly converts into bonuses: less
hash collisions > more performance. Want 64-bit hash? 128-bit?
Whatever you desire:

Object>>identityHash
   ^ self propertiesAt: #hash ifAbsentPut: [ HashGenerator newHashValue ]

and once we could have per-object properties.. and lazy become, things
like Magma will get a HUGE benefits straightly out of the box.
Because look, lazy become, immutability - those two addressing many
problems related to OODB implementation
(i barely see other use cases, where immutability would be as useful
as in cases of OODB)..
so for me it is logical to have this last step: by adding arbitrary
properties, OODB now can store the ID there.

--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Colin Putney-3
 

On 2012-06-11, at 1:36 AM, Igor Stasenko wrote:

Because look, lazy become, immutability - those two addressing many
problems related to OODB implementation
(i barely see other use cases, where immutability would be as useful
as in cases of OODB)..
so for me it is logical to have this last step: by adding arbitrary
properties, OODB now can store the ID there.

Well, it goes a little further than that. I think immutability is generally useful for any system that persists objects outside the image. OODBs are one example, but the same applies for ORM, LOOM-style virtual memory, or even syncing of state across the network. I've even wished for immutability when working on web applications. It's a join point for any aspects related to state.

Arbitrary properties are actually used quite a bit already, they just don't have VM support. Morphic and Tweak use them extensively, as does the dependency system. I suspect we'd find that a lot of hacks and kludges could be subsumed by VM-supported arbitrary properties. (e.g., ephemerons). 

So yeah, +1 to arbitrary properties.

Colin
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko

if you remember i said before, that i don't like immutability enforced
through VM contract.

imagine two frameworks, using immutability flag for their own
purposes, and contending for ownership
of same object(s)..

IMO, there are other , better, solutions to that but i'm not going to
go in details...


On 11 June 2012 18:05, Colin Putney <[hidden email]> wrote:

>
>
> On 2012-06-11, at 1:36 AM, Igor Stasenko wrote:
>
> Because look, lazy become, immutability - those two addressing many
> problems related to OODB implementation
> (i barely see other use cases, where immutability would be as useful
> as in cases of OODB)..
> so for me it is logical to have this last step: by adding arbitrary
> properties, OODB now can store the ID there.
>
>
> Well, it goes a little further than that. I think immutability is generally useful for any system that persists objects outside the image. OODBs are one example, but the same applies for ORM, LOOM-style virtual memory, or even syncing of state across the network. I've even wished for immutability when working on web applications. It's a join point for any aspects related to state.
>
> Arbitrary properties are actually used quite a bit already, they just don't have VM support. Morphic and Tweak use them extensively, as does the dependency system. I suspect we'd find that a lot of hacks and kludges could be subsumed by VM-supported arbitrary properties. (e.g., ephemerons).
>
> So yeah, +1 to arbitrary properties.
>
> Colin
>



--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Chris Muller-3
In reply to this post by Igor Stasenko
 
> and once we could have per-object properties.. and lazy become, things
> like Magma will get a HUGE benefits straightly out of the box.

It's true -- one of the busiest occupations for Magma is to maintain
its two-way [object<--> oid] dictionary mappings.  Dropping that in
favor of a fast per-object #oid attribute would be like dropping an
anchor.

It seems like per-object attributes could provide some amazing
leverage in other domains too.  +1.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Chris Muller-3
In reply to this post by Igor Stasenko
 
> if you remember i said before, that i don't like immutability enforced
> through VM contract.
>
> imagine two frameworks, using immutability flag for their own
> purposes, and contending for ownership
> of same object(s)..
>
> IMO, there are other , better, solutions to that but i'm not going to
> go in details...

I totally agree with you here, too.  An immutability-bit is like
static-typing, or RDBMS RI, or "truly private" methods (to which I'm
also opposed).  Each of those enforces some past notion about that
object/method/whatever on the present, even until it should no longer
be the case -- at which point you simply go into the "meta settings"
and turn it off because it got in your way.

Probably better to just avoid all that in the first place.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Colin Putney-3

On Mon, Jun 11, 2012 at 2:42 PM, Chris Muller <[hidden email]> wrote:

>
>> if you remember i said before, that i don't like immutability enforced
>> through VM contract.
>>
>> imagine two frameworks, using immutability flag for their own
>> purposes, and contending for ownership
>> of same object(s)..
>>
>> IMO, there are other , better, solutions to that but i'm not going to
>> go in details...
>
> I totally agree with you here, too.  An immutability-bit is like
> static-typing, or RDBMS RI, or "truly private" methods (to which I'm
> also opposed).  Each of those enforces some past notion about that
> object/method/whatever on the present, even until it should no longer
> be the case -- at which point you simply go into the "meta settings"
> and turn it off because it got in your way.
>
> Probably better to just avoid all that in the first place.

Ok, what if we call it a "fast write-barrier provided by the VM".
Would that change your view? Igor?

Colin
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko

On 11 June 2012 23:46, Colin Putney <[hidden email]> wrote:

>
> On Mon, Jun 11, 2012 at 2:42 PM, Chris Muller <[hidden email]> wrote:
>>
>>> if you remember i said before, that i don't like immutability enforced
>>> through VM contract.
>>>
>>> imagine two frameworks, using immutability flag for their own
>>> purposes, and contending for ownership
>>> of same object(s)..
>>>
>>> IMO, there are other , better, solutions to that but i'm not going to
>>> go in details...
>>
>> I totally agree with you here, too.  An immutability-bit is like
>> static-typing, or RDBMS RI, or "truly private" methods (to which I'm
>> also opposed).  Each of those enforces some past notion about that
>> object/method/whatever on the present, even until it should no longer
>> be the case -- at which point you simply go into the "meta settings"
>> and turn it off because it got in your way.
>>
>> Probably better to just avoid all that in the first place.
>
> Ok, what if we call it a "fast write-barrier provided by the VM".
> Would that change your view? Igor?
>

The problem is that it is not nearly as fast:
 - at every memory write you have to check this flag, even if it is not used
anywhere by anything at language side, you will still pay the price.

I would understand if we want to have a dedicated VM where this flag
will serve only single purpose to support a specific software (like
OODB) which is the only
user of this flag.

Another (open) question, is how to deal with immutability in presence
of  become, i.e.:

mutableCopy := immutableObject shallowCopy.
immutableObject becomeForward: mutableCopy.

such kind of things making immutability useless for protecting some
critical parts of object memory,
like preventing modification of compiled method literals:
  yes you cannot change the immutable object, but you can replace all
pointers to it with it's own mutable copy, which is equivalent to
making it mutable again.

--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Bert Freudenberg
 
On 2012-06-13, at 05:27, Igor Stasenko wrote:

> Another (open) question, is how to deal with immutability in presence
> of  become, i.e.:
>
> mutableCopy := immutableObject shallowCopy.
> immutableObject becomeForward: mutableCopy.
>
> such kind of things making immutability useless for protecting some
> critical parts of object memory,
> like preventing modification of compiled method literals:
>  yes you cannot change the immutable object, but you can replace all
> pointers to it with it's own mutable copy, which is equivalent to
> making it mutable again.


Why should the VM allow become of an immutable object?

- Bert -

Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko

On 13 June 2012 09:41, Bert Freudenberg <[hidden email]> wrote:

>
> On 2012-06-13, at 05:27, Igor Stasenko wrote:
>
>> Another (open) question, is how to deal with immutability in presence
>> of  become, i.e.:
>>
>> mutableCopy := immutableObject shallowCopy.
>> immutableObject becomeForward: mutableCopy.
>>
>> such kind of things making immutability useless for protecting some
>> critical parts of object memory,
>> like preventing modification of compiled method literals:
>>  yes you cannot change the immutable object, but you can replace all
>> pointers to it with it's own mutable copy, which is equivalent to
>> making it mutable again.
>
>
> Why should the VM allow become of an immutable object?
>

You can disallow this. But you can only make it harder: i can do it
manually - take all objects pointing to immutable one and replace the
pointer to it with it's mutable copy. And it is completely legal,
except that it will be a bit harder, since done not by primitive.

Disallowing #become on immutables raising many additional questions:

what is your action when you need to migrate instances of a class due
to it's reshaping, while some of them are immutable?
(I bet there is be many other examples, when this will break existing
traditional schemes, like working
with proxies etc).

I don't wanna spread FUD.. just want to make sure that we ready to
answer for every such question.


> - Bert -

--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Andreas.Raab
 

On 13 June 2012 09:41, Bert Freudenberg <[hidden email]> wrote:

>
> On 2012-06-13, at 05:27, Igor Stasenko wrote:
>
>> Another (open) question, is how to deal with immutability in presence
>> of  become, i.e.:
>>
>> mutableCopy := immutableObject shallowCopy.
>> immutableObject becomeForward: mutableCopy.
>>
>> such kind of things making immutability useless for protecting some
>> critical parts of object memory,
>> like preventing modification of compiled method literals:
>>  yes you cannot change the immutable object, but you can replace all
>> pointers to it with it's own mutable copy, which is equivalent to
>> making it mutable again.
>
>
> Why should the VM allow become of an immutable object?
>

You can disallow this. But you can only make it harder: i can do it
manually - take all objects pointing to immutable one and replace the
pointer to it with it's mutable copy. And it is completely legal,
except that it will be a bit harder, since done not by primitive.

You are confusing something here. Become is just a "fancier form" of assignment that you could in fact write without a primitive by enumerating all the references to an object. If you keep that in mind it is obvious that since assigning an immutable *to* a variable is never a problem using become *with* an immutable as argument is neither since in both cases the immutable remains immutable.

The real question that arises is whether become should be allowed to change the *contents* of an immutable object. Personally, I think it should not, but this has some side effects that require fixing such as class migration which really should have a separate primitive to update its instances after reshape - the rules for this *should* include changing immutables unless you want to change the system to deal with multiple concurrent class versions (much pain down that path).

Disallowing #become on immutables raising many additional questions:

what is your action when you need to migrate instances of a class due
to it's reshaping, while some of them are immutable?

Class migration should really have its own primitive. If it had, much pain could be avoided in migrating classes (see comments in ClassBuilder>>update:to:). And then one could decide on the proper policy to use for immutables.

Cheers,
  - Andreas


(I bet there is be many other examples, when this will break existing
traditional schemes, like working
with proxies etc).

I don't wanna spread FUD.. just want to make sure that we ready to
answer for every such question.


> - Bert -

--
Best regards,
Igor Stasenko.



--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Bert Freudenberg
In reply to this post by Igor Stasenko

On 2012-06-13, at 14:10, Igor Stasenko wrote:

>
> On 13 June 2012 09:41, Bert Freudenberg <[hidden email]> wrote:
>>
>> On 2012-06-13, at 05:27, Igor Stasenko wrote:
>>
>>> Another (open) question, is how to deal with immutability in presence
>>> of  become, i.e.:
>>>
>>> mutableCopy := immutableObject shallowCopy.
>>> immutableObject becomeForward: mutableCopy.
>>>
>>> such kind of things making immutability useless for protecting some
>>> critical parts of object memory,
>>> like preventing modification of compiled method literals:
>>>  yes you cannot change the immutable object, but you can replace all
>>> pointers to it with it's own mutable copy, which is equivalent to
>>> making it mutable again.
>>
>>
>> Why should the VM allow become of an immutable object?
>>
>
> You can disallow this. But you can only make it harder: i can do it
> manually - take all objects pointing to immutable one and replace the
> pointer to it with it's mutable copy. And it is completely legal,
> except that it will be a bit harder, since done not by primitive.

Okay, I guess you're right. (although the solution would be "don't do that" rather than "immutability is useless").

But become would not replace a reference in an immutable object. Which is a major point of immutable objects: all objects "inside" an immutable object are immutable, too. (although I think even that was up for discussion last time we had this conversation)

> Disallowing #become on immutables raising many additional questions:
>
> what is your action when you need to migrate instances of a class due
> to it's reshaping, while some of them are immutable?

Interesting case. One solution would be to simply fail the class reshape if there are immutable instances. One would have to do a mutable copy + become.

> (I bet there is be many other examples, when this will break existing
> traditional schemes, like working
> with proxies etc).

It would have to be used sparingly and with care, sure.

> I don't wanna spread FUD.. just want to make sure that we ready to
> answer for every such question.


Yep. That's why we're discussing :)

- Bert -


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko
In reply to this post by Andreas.Raab

On 13 June 2012 15:16, Andreas Raab <[hidden email]> wrote:

>
>
> On 13 June 2012 09:41, Bert Freudenberg <[hidden email]> wrote:
>>
>> On 2012-06-13, at 05:27, Igor Stasenko wrote:
>>
>>> Another (open) question, is how to deal with immutability in presence
>>> of  become, i.e.:
>>>
>>> mutableCopy := immutableObject shallowCopy.
>>> immutableObject becomeForward: mutableCopy.
>>>
>>> such kind of things making immutability useless for protecting some
>>> critical parts of object memory,
>>> like preventing modification of compiled method literals:
>>>  yes you cannot change the immutable object, but you can replace all
>>> pointers to it with it's own mutable copy, which is equivalent to
>>> making it mutable again.
>>
>>
>> Why should the VM allow become of an immutable object?
>>
>
> You can disallow this. But you can only make it harder: i can do it
> manually - take all objects pointing to immutable one and replace the
> pointer to it with it's mutable copy. And it is completely legal,
> except that it will be a bit harder, since done not by primitive.
>
>
> You are confusing something here. Become is just a "fancier form" of assignment that you could in fact write without a primitive by enumerating all the references to an object. If you keep that in mind it is obvious that since assigning an immutable *to* a variable is never a problem using become *with* an immutable as argument is neither since in both cases the immutable remains immutable.
>

But you just make it harder.. what prevents me from modifying all
pointers to immutable(a) which holds a pointer to target immutable(t),
 by mutable object (b) with replaced references to mutable (t')?

> The real question that arises is whether become should be allowed to change the *contents* of an immutable object. Personally, I think it should not, but this has some side effects that require fixing such as class migration which really should have a separate primitive to update its instances after reshape - the rules for this *should* include changing immutables unless you want to change the system to deal with multiple concurrent class versions (much pain down that path).
>

This is where i usually stop. IMO this is already showing too much
costs (to my taste) of introducing such restrictions and maintaining
them. I would ask first, is this worth an effort, if it so costly in
terms of implementation?
In theory, of course we can enforce anything, but in practice that
means a lot of complex code with many checks at VM side.. This is not
what i like to see in a first place, especially knowing that Squeak
lived well so far without any immutability and it does not feels like
we miss it badly.

For me, if we introduce some feature or want to enforce certain
behavior, then there should be very strong reason
for doing so, with benefits which clearly outweigh the costs and
making some things easier to do, like with arbitrary properties slot
- there's very little additional complexity in VM side, and very
simple to use at language side, replacing many crippling schemes ,
like properties in Morphic , Dependents in Object etc.
In contrast, an immutables adds another dimension of complexity to
already complex soup in both VM and language sides, and benefits are
not so clear, as to me.

> Disallowing #become on immutables raising many additional questions:
>
> what is your action when you need to migrate instances of a class due
> to it's reshaping, while some of them are immutable?
>
>
> Class migration should really have its own primitive. If it had, much pain could be avoided in migrating classes (see comments in ClassBuilder>>update:to:). And then one could decide on the proper policy to use for immutables.
>
> Cheers,
>   - Andreas
>
>
> (I bet there is be many other examples, when this will break existing
> traditional schemes, like working
> with proxies etc).
>
> I don't wanna spread FUD.. just want to make sure that we ready to
> answer for every such question.
>
>
>> - Bert -
>
> --
> Best regards,
> Igor Stasenko.
>
> --
> Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
>



--
Best regards,
Igor Stasenko.
Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-project] Plan/discussion/communication around new object format

Andreas.Raab
 
Igor wrote:
On 13 June 2012 15:16, Andreas Raab <[hidden email]> wrote:

>
>
> On 13 June 2012 09:41, Bert Freudenberg <[hidden email]> wrote:
>>
>> On 2012-06-13, at 05:27, Igor Stasenko wrote:
>>
>>> Another (open) question, is how to deal with immutability in presence
>>> of  become, i.e.:
>>>
>>> mutableCopy := immutableObject shallowCopy.
>>> immutableObject becomeForward: mutableCopy.
>>>
>>> such kind of things making immutability useless for protecting some
>>> critical parts of object memory,
>>> like preventing modification of compiled method literals:
>>>  yes you cannot change the immutable object, but you can replace all
>>> pointers to it with it's own mutable copy, which is equivalent to
>>> making it mutable again.
>>
>>
>> Why should the VM allow become of an immutable object?
>>
>
> You can disallow this. But you can only make it harder: i can do it
> manually - take all objects pointing to immutable one and replace the
> pointer to it with it's mutable copy. And it is completely legal,
> except that it will be a bit harder, since done not by primitive.
>
>
> You are confusing something here. Become is just a "fancier form" of assignment that you could in fact write without a primitive by enumerating all the references to an object. If you keep that in mind it is obvious that since assigning an immutable *to* a variable is never a problem using become *with* an immutable as argument is neither since in both cases the immutable remains immutable.
>

But you just make it harder.. what prevents me from modifying all
pointers to immutable(a) which holds a pointer to target immutable(t),
by mutable object (b) with replaced references to mutable (t')?

Absolutely nothing. Why should it? Assuming that all your "pointers to immutable a" are mutable themselves you can absolutely change them. Go right ahead. You *can* replace an immutable by a mutable object, it's trivial:

  value := Object new beImmutable.
  value := value asMutableCopy.

See, done. So where is the problem? Whether you write it like that or whether you write it like here:

  value := Object new beImmutable.
  value := value become: value asMutableCopy.

is entirely irrelevant. It doesn't change the fact that the original object is still immutable after the operation and that's the only thing that counts for the VM operation. Of course the following would be somewhat different:

  immutable := (#abc -> Object new) asImmutable.
  value := immutable value beImmutable.                   "making the value immutable"
  value := value become: value asMutableCopy.

In this case, the following assertions should hold:

  self deny: value == immutable value. "assoc was frozen, no changing its value"
  self assert: value isMutable.
  self deny: immutable value isImmutable.

So this all works out just fine. BTW, I think that discussing the use of become for hacking into the system is beyond the scope of this discussion. Become is used in very few situations and its semantics are necessarily such that one can do bad things with it. Duly noted, let's move on.


> The real question that arises is whether become should be allowed to change the *contents* of an immutable object. Personally, I think it should not, but this has some side effects that require fixing such as class migration which really should have a separate primitive to update its instances after reshape - the rules for this *should* include changing immutables unless you want to change the system to deal with multiple concurrent class versions (much pain down that path).
>

This is where i usually stop. IMO this is already showing too much
costs (to my taste) of introducing such restrictions and maintaining
them. I would ask first, is this worth an effort, if it so costly in
terms of implementation?
In theory, of course we can enforce anything, but in practice that
means a lot of complex code with many checks at VM side.. This is not
what i like to see in a first place, especially knowing that Squeak
lived well so far without any immutability and it does not feels like
we miss it badly.

I absolutely do. There were several situations (for example in Croquet and at Teleplace) where we changed our designs to the worse merely due to the lack of immutability support. The main thing that immutability fixes is to prevent accidental modifications of objects thought to be immutable (method literals for example), which when they happen are *extremely* hard to find. But it also gives rise to many other interesting techniques (read-only transactions etc).

Cheers,
  - Andreas


For me, if we introduce some feature or want to enforce certain
behavior, then there should be very strong reason
for doing so, with benefits which clearly outweigh the costs and
making some things easier to do, like with arbitrary properties slot
- there's very little additional complexity in VM side, and very
simple to use at language side, replacing many crippling schemes ,
like properties in Morphic , Dependents in Object etc.
In contrast, an immutables adds another dimension of complexity to
already complex soup in both VM and language sides, and benefits are
not so clear, as to me.


> Disallowing #become on immutables raising many additional questions:
>
> what is your action when you need to migrate instances of a class due
> to it's reshaping, while some of them are immutable?
>
>
> Class migration should really have its own primitive. If it had, much pain could be avoided in migrating classes (see comments in ClassBuilder>>update:to:). And then one could decide on the proper policy to use for immutables.
>
> Cheers,
>   - Andreas
>
>
> (I bet there is be many other examples, when this will break existing
> traditional schemes, like working
> with proxies etc).
>
> I don't wanna spread FUD.. just want to make sure that we ready to
> answer for every such question.
>
>
>> - Bert -
>
> --
> Best regards,
> Igor Stasenko.
>
> --
> Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
> belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
>



--
Best regards,
Igor Stasenko.



--
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
1234