Spur Squeak Trunk Image Available

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Spur Squeak Trunk Image Available

Eliot Miranda-2
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda
Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

stepharo
Thanks eliot.
As soon as it is available for Pharo we will try with some large moose
images.

Stef

On 13/6/14 01:41, Eliot Miranda wrote:

> Hi All,
>
>     it gives me great pleasure to let you know that a spur-format
> trunk Squeak image is finally available at
> http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are
> available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.
>
> Spur is a new object representation and garbage collector for
> Squeak/Pharo/Croquet/Newspeak.
>
> Features
> The object representation is significantly simpler than the existing
> one, and hence permits a lot of JIT optimizations, in particular
> allocating objects in machine code.  This speeds up new, new: et al,
> but also speeds up blocks because contexts and closures are now
> allocated in machine code.  It also provides immediate characters, so
> for example accessing wide strings is much faster in Spur, since
> characters do not have to be instantiated to represent characters with
> codes greater than 255.
>
> The garbage collector has a scavenger and a global scan-mark-compact
> collector.  The scavenger is significantly faster than the existing
> pointer-reversal scan-mark-compact, hence GC performance is much improved.
>
> The memory manager manages old space as a sequence of segments, as
> opposed to the single contiguous space provided by the existing memory
> manager.  The memory manager grows the heap a segment at a time, and
> can and will release empty segments back to the host OS after a full
> GC.  Hence Spur is able to grow the heap to the limit of available
> memory without one having to specify the VM's memory size at start-up.
>
> The object representation uses "lazy forwarding" to implement become:,
> creating copies of objects that are becommed, and forwarding the
> existing objects to the copies.  While Spur still scans the stack zone
> on become to ensure no forwarding pointers to the receiver exist in
> stack frames (for check-free push and store instance variable
> operations), it does not scan the entire heap, catching sends to
> forwarded objects as part of the normal message send class checks,
> hence following forwarding pointers lazily, and eliminating forwarders
> during GC.  The existing memory manager does a full memory sweep and
> compact to implement become.  Hence Spur provides the performance
> advantages of direct pointers while providing a significantly faster
> become.
>
> While Spur uses moving GC (scavenging and compaction on full GC), just
> like the existing memory manager, Spur supports pinning, the ability
> to stop an object from moving.  Old space objects will not be moved if
> pinned.  Attempting to pin a new space object causes a become,
> forwarding the new space object to a pinned copy in old space.  This
> allows simpler interfacing with foreign code through the FFI, since
> one can hand out references to pinned objects in the knowledge that
> they will not be moved by the GC.
>
> Finally Spur supports ephemerons in a simple and direct way, providing
> pre-mortem per-instance finalization.  Although the image-level
> support needs to be written, it should soon be possible to improve the
> finalization of entities such as buffered files (ensuring they are
> flushed before being GCed), etc.
>
>
> Future Work
> Spur is as yet a work in progress.  The 32-bit implementation is
> usable and appears stable.  The major missing component is an
> incremental scan-mark GC that should eliminate long pauses due to the
> global scan-mark-compact GC (which is still invoked at snapshot time).
>  I hope to start on this soon.  But another key facet of Spur is that
> the object header format and the sizes of objects are common between
> 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are
> multiples of 8 bytes, so there may be an unused slot at the end of a
> 32-bit object with an odd number of slots. Hence Spur is close to
> providing a "true" 64-bit system, one with 61-bit SmallIntegers, and
> 61-bit SmallFloats (objects with the same precision, but less range
> that 64-bit Float, done by stealing bits from the exponent field).  I
> look forward to collaborating with Esteban Lorenzano on 64-bit Spur
> and hope that it will be available early next year.
>
>
> Experience
> I am of course interested in reports of performance effects.  Under
> certain, hopefully rare circumstances, Spur may actually be slower
> (one is when the number of processes involved in process switching
> exceeds the number of stack pages in the stack zone).  But my limited
> experience is that Spur is significantly faster than the existing VM.
>  Please post experiences, both positive and negative.
>
> Finally, caveat emptor!  This is alpha code.  Bugs may result in image
> corruption.  If you do use Spur, please try and back up your work just
> in case.  And if anything does go wrong please let me know,
> preferrably providing a reproducible case.
>
>
> Enjoy!
> Eliot Miranda


Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Eliot Miranda-2


On Thu, Jun 12, 2014 at 10:48 PM, stepharo <[hidden email]> wrote:
Thanks eliot.
As soon as it is available for Pharo we will try with some large moose images.

OK, but just so you know I'm not going to do the Pharo bootstrap.  I hope someone in your team will have a go.  I'm happy to help but I don't know Pharo well enough to do this.  It's a matter of making sure the Monticelo packages can be edited.  The Squeak code may just work but I wouldn't know.
 

Stef


On 13/6/14 01:41, Eliot Miranda wrote:
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda





--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

EstebanLM

On 13 Jun 2014, at 04:29, Eliot Miranda <[hidden email]> wrote:



On Thu, Jun 12, 2014 at 10:48 PM, stepharo <[hidden email]> wrote:
Thanks eliot.
As soon as it is available for Pharo we will try with some large moose images.

OK, but just so you know I'm not going to do the Pharo bootstrap.  I hope someone in your team will have a go.  I'm happy to help but I don't know Pharo well enough to do this.  It's a matter of making sure the Monticelo packages can be edited.  The Squeak code may just work but I wouldn't know.

yes, I’m on it :)

Esteban

 

Stef


On 13/6/14 01:41, Eliot Miranda wrote:
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available athttp://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda





-- 
best,
Eliot

Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

EstebanLM

On 13 Jun 2014, at 06:44, Esteban Lorenzano <[hidden email]> wrote:


On 13 Jun 2014, at 04:29, Eliot Miranda <[hidden email]> wrote:



On Thu, Jun 12, 2014 at 10:48 PM, stepharo <[hidden email]> wrote:
Thanks eliot.
As soon as it is available for Pharo we will try with some large moose images.

OK, but just so you know I'm not going to do the Pharo bootstrap.  I hope someone in your team will have a go.  I'm happy to help but I don't know Pharo well enough to do this.  It's a matter of making sure the Monticelo packages can be edited.  The Squeak code may just work but I wouldn't know.

yes, I’m on it :)

I’m already building the PharoVM with Spur :)
but that’s just the beginning… I need to make it NB compatible first (a small step, I will finish today or tomorrow)
and I need to convert the images… that will take more time (and I will need some help… cough, cough, Camille, cough, cough, Guille) :P

Esteban


Esteban

 

Stef


On 13/6/14 01:41, Eliot Miranda wrote:
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available athttp://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda





-- 
best,
Eliot


Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Eliot Miranda-2
Hi Esteban,


On Fri, Jun 13, 2014 at 6:31 AM, Esteban Lorenzano <[hidden email]> wrote:

On 13 Jun 2014, at 06:44, Esteban Lorenzano <[hidden email]> wrote:


On 13 Jun 2014, at 04:29, Eliot Miranda <[hidden email]> wrote:



On Thu, Jun 12, 2014 at 10:48 PM, stepharo <[hidden email]> wrote:
Thanks eliot.
As soon as it is available for Pharo we will try with some large moose images.

OK, but just so you know I'm not going to do the Pharo bootstrap.  I hope someone in your team will have a go.  I'm happy to help but I don't know Pharo well enough to do this.  It's a matter of making sure the Monticelo packages can be edited.  The Squeak code may just work but I wouldn't know.

yes, I’m on it :)

I’m already building the PharoVM with Spur :)
but that’s just the beginning… I need to make it NB compatible first (a small step, I will finish today or tomorrow)
and I need to convert the images… that will take more time (and I will need some help… cough, cough, Camille, cough, cough, Guille) :P

let me know if I can help...  I'm curious whether the basic image bootstrap works out-of-the-box or not
(SpurBootstrap bootstrapImage: 'imageBasename').

Yesterday I wrote scripts to automate the Squeak process, and these were used to produce the Squeak trunk Spur image, see

buildspurtrunkimage.sh
BuildSqueakTrunkImage.st
WriteSpurPackagesToTempDir.st
LoadSpurPackagesFromTempDir.st

Most of this is scripting the Monticello package patch and load.

Esteban


Esteban

 

Stef


On 13/6/14 01:41, Eliot Miranda wrote:
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available athttp://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda





-- 
best,
Eliot





--
best,
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Philippe Marschall-2
In reply to this post by Eliot Miranda-2
On 13.06.14 01:41, Eliot Miranda wrote:

>
>
>
>
> Hi All,
>
>      it gives me great pleasure to let you know that a spur-format trunk
> Squeak image is finally available at
> http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are
> available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.
>
> …

I'm seeing a Seaside request handling benchmark going from 10k req/s to
11k req/s.
Don't be too quick to dismiss this as being IO-bound (being IO-bound is
actually quite hard on Squeak/Pharo). During the benchmark Squeak fully
saturates one core. It is hard to tell what the limiting factor for this
benchmark actually is. But removing one or two String allocations from
the request handling loop usually yields about 100 to 200 additional req/s.

Cheers
Philippe



Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Eliot Miranda-2
Hi Philippe,


On Tue, Jun 17, 2014 at 1:03 AM, Philippe Marschall <[hidden email]> wrote:
On 13.06.14 01:41, Eliot Miranda wrote:




Hi All,

     it gives me great pleasure to let you know that a spur-format trunk
Squeak image is finally available at
http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are
available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.



I'm seeing a Seaside request handling benchmark going from 10k req/s to 11k req/s.
Don't be too quick to dismiss this as being IO-bound (being IO-bound is actually quite hard on Squeak/Pharo). During the benchmark Squeak fully saturates one core. It is hard to tell what the limiting factor for this benchmark actually is. But removing one or two String allocations from the request handling loop usually yields about 100 to 200 additional req/s.

That's good news at least.  One thing you can try is the VMProfiler.  It's an interactive Morphic application so it will try and open and may crash Load the VMProfiler and use it via

    VMProfiler openInstance
        spyOn: [...];
        report: aStream.

e.g. 
VMProfiler openInstance
spyOn: [1 tinyBenchmarks];
report: (Transcript cr; yourself).
Transcript flush.

produces:

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak  6/17/2014 
eden size: 2,603,344  stack pages: 160  code size: 1,048,576

1 tinyBenchmarks

gc prior.  clear prior.  
6.298 seconds; sampling frequency 1473 hz
9231 samples in the VM (9275 samples in the entire program)  99.53% of total

8798 samples in generated vm code 95.31% of entire vm (94.86% of total)
433 samples in vanilla vm code   4.69% of entire vm (  4.67% of total)

% of generated vm code (% of total) (samples) (cumulative)
61.29%    (58.13%) Integer>>benchFib (5392) (61.29%)
13.47%    (12.78%) Integer>>benchmark (1185) (74.76%)
11.48%    (10.89%) Object>>at:put: (1010) (86.24%)
  9.60%    (  9.11%) SmallInteger>>+ (845) (95.84%)
  3.99%    (  3.78%) Object>>at: (351) (99.83%)
  0.07%    (  0.06%) ceBaseFram...Trampoline(6) (99.90%)
  0.03%    (  0.03%) ceMethodAbort0Args (3) (99.93%)
  0.03%    (  0.03%) Sequenceabl...om:to:put: (3) (99.97%)
  0.02%    (  0.02%) Magnitude>>min: (2) (99.99%)
  0.01%    (  0.01%) ceEnterCog...ceiverReg (1) (100.0%)


% of vanilla vm code (% of total) (samples) (cumulative)
32.33%    (  1.51%) primitiveStringReplace (140) (32.33%)
24.25%    (  1.13%) copyAndForward (105) (56.58%)
13.63%    (  0.64%) marryFrameSP (59) (70.21%)
10.85%    (  0.51%) instantiateClassindexableSize (47) (81.06%)
  3.46%    (  0.16%) ceBaseFrameReturn (15) (84.53%)
  2.08%    (  0.10%) scavengeLoop (9) (86.61%)
  1.62%    (  0.08%) checkForEvent...ontextSwitch (7) (88.22%)
  1.39%    (  0.06%) externalEnsureIsBaseFrame (6) (89.61%)
  1.39%    (  0.06%) voidVMStateForSn...nalPrimitivesIf(6) (90.99%)
  1.15%    (  0.05%) mapInterpreterOops (5) (92.15%)
  1.15%    (  0.05%) returnToExecuti...tContextSwitch(5) (93.30%)
  1.15%    (  0.05%) scavengeReferentsOf (5) (94.46%)
  0.92%    (  0.04%) handleStackOverflow (4) (95.38%)
  0.92%    (  0.04%) signed64BitIntegerFor (4) (96.30%)
  0.46%    (  0.02%) primitiveShortAt (2) (96.77%)
  0.46%    (  0.02%) ceNonLocalReturn (2) (97.23%)
  0.23%    (  0.01%) gMoveCwR (1) (97.46%)
  0.23%    (  0.01%) genInnerPrimit...CharacterinReg (1) (97.69%)
  0.23%    (  0.01%) genPrimitiveMultiply (1) (97.92%)
  0.23%    (  0.01%) primitiveInflateDecompressBlock(1) (98.15%)
  0.23%    (  0.01%) primitiveNewWithArg (1) (98.38%)
  0.23%    (  0.01%) primitiveTestAnd...fCriticalSection(1) (98.61%)
  0.23%    (  0.01%) processWeaklings (1) (98.85%)
  0.23%    (  0.01%) scavengingGCTenuringIf (1) (99.08%)
  0.23%    (  0.01%) updateMaybeObjRefAt (1) (99.31%)
  0.23%    (  0.01%) wakeHighestPriority (1) (99.54%)
  0.23%    (  0.01%) ceCannotResume (1) (99.77%)
  0.23%    (  0.01%) ioPositionOfWindowSetxy (1) (100.0%)



**Memory**
old +574,744 bytes

**GCs**
full 0 totalling 0ms (0% elapsed time)
scavenges 330 totalling 93ms (1.477% elapsed time), avg 0.282ms
tenures 0
root table 0 overflows

**Compiled Code Compactions**
0 totalling 0ms (0% elapsed time)

**Events**
Process switches 66 (10 per second)
ioProcessEvents calls 314 (50 per second)
Interrupt checks 3462 (550 per second)
Event checks 3585 (569 per second)
Stack overflows 421897 (66989 per second)
Stack page divorces 0 (0 per second)

        

While this profile is flat, it does show where the VM is spending its time, and since the VM is not something with deep call chains, this view is the most useful I have.
The VMProfiler is part of the CogTools package at http://source.squeak.org/VMMaker.

P.S. Extending the size of the DefaultTabsArray will result in a less cramped report.

HTH
Eliot
Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Eliot Miranda-2



On Tue, Jun 17, 2014 at 10:29 AM, Eliot Miranda <[hidden email]> wrote:
Hi Philippe,


On Tue, Jun 17, 2014 at 1:03 AM, Philippe Marschall <[hidden email]> wrote:
On 13.06.14 01:41, Eliot Miranda wrote:




Hi All,

     it gives me great pleasure to let you know that a spur-format trunk
Squeak image is finally available at
http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are
available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.



I'm seeing a Seaside request handling benchmark going from 10k req/s to 11k req/s.
Don't be too quick to dismiss this as being IO-bound (being IO-bound is actually quite hard on Squeak/Pharo). During the benchmark Squeak fully saturates one core. It is hard to tell what the limiting factor for this benchmark actually is. But removing one or two String allocations from the request handling loop usually yields about 100 to 200 additional req/s.

That's good news at least.  One thing you can try is the VMProfiler.  It's an interactive Morphic application so it will try and open and may crash Load the VMProfiler and use it via

What I meant to say is that it may fail if the morph can't open.  It doesn't routinely fail :-)

 

    VMProfiler openInstance
        spyOn: [...];
        report: aStream.

e.g. 
VMProfiler openInstance
spyOn: [1 tinyBenchmarks];
report: (Transcript cr; yourself).
Transcript flush.

produces:

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak  6/17/2014 
eden size: 2,603,344  stack pages: 160  code size: 1,048,576

1 tinyBenchmarks

gc prior.  clear prior.  
6.298 seconds; sampling frequency 1473 hz
9231 samples in the VM (9275 samples in the entire program)  99.53% of total

8798 samples in generated vm code 95.31% of entire vm (94.86% of total)
433 samples in vanilla vm code   4.69% of entire vm (  4.67% of total)

% of generated vm code (% of total) (samples) (cumulative)
61.29%    (58.13%) Integer>>benchFib (5392) (61.29%)
13.47%    (12.78%) Integer>>benchmark (1185) (74.76%)
11.48%    (10.89%) Object>>at:put: (1010) (86.24%)
  9.60%    (  9.11%) SmallInteger>>+ (845) (95.84%)
  3.99%    (  3.78%) Object>>at: (351) (99.83%)
  0.07%    (  0.06%) ceBaseFram...Trampoline(6) (99.90%)
  0.03%    (  0.03%) ceMethodAbort0Args (3) (99.93%)
  0.03%    (  0.03%) Sequenceabl...om:to:put: (3) (99.97%)
  0.02%    (  0.02%) Magnitude>>min: (2) (99.99%)
  0.01%    (  0.01%) ceEnterCog...ceiverReg (1) (100.0%)


% of vanilla vm code (% of total) (samples) (cumulative)
32.33%    (  1.51%) primitiveStringReplace (140) (32.33%)
24.25%    (  1.13%) copyAndForward (105) (56.58%)
13.63%    (  0.64%) marryFrameSP (59) (70.21%)
10.85%    (  0.51%) instantiateClassindexableSize (47) (81.06%)
  3.46%    (  0.16%) ceBaseFrameReturn (15) (84.53%)
  2.08%    (  0.10%) scavengeLoop (9) (86.61%)
  1.62%    (  0.08%) checkForEvent...ontextSwitch (7) (88.22%)
  1.39%    (  0.06%) externalEnsureIsBaseFrame (6) (89.61%)
  1.39%    (  0.06%) voidVMStateForSn...nalPrimitivesIf(6) (90.99%)
  1.15%    (  0.05%) mapInterpreterOops (5) (92.15%)
  1.15%    (  0.05%) returnToExecuti...tContextSwitch(5) (93.30%)
  1.15%    (  0.05%) scavengeReferentsOf (5) (94.46%)
  0.92%    (  0.04%) handleStackOverflow (4) (95.38%)
  0.92%    (  0.04%) signed64BitIntegerFor (4) (96.30%)
  0.46%    (  0.02%) primitiveShortAt (2) (96.77%)
  0.46%    (  0.02%) ceNonLocalReturn (2) (97.23%)
  0.23%    (  0.01%) gMoveCwR (1) (97.46%)
  0.23%    (  0.01%) genInnerPrimit...CharacterinReg (1) (97.69%)
  0.23%    (  0.01%) genPrimitiveMultiply (1) (97.92%)
  0.23%    (  0.01%) primitiveInflateDecompressBlock(1) (98.15%)
  0.23%    (  0.01%) primitiveNewWithArg (1) (98.38%)
  0.23%    (  0.01%) primitiveTestAnd...fCriticalSection(1) (98.61%)
  0.23%    (  0.01%) processWeaklings (1) (98.85%)
  0.23%    (  0.01%) scavengingGCTenuringIf (1) (99.08%)
  0.23%    (  0.01%) updateMaybeObjRefAt (1) (99.31%)
  0.23%    (  0.01%) wakeHighestPriority (1) (99.54%)
  0.23%    (  0.01%) ceCannotResume (1) (99.77%)
  0.23%    (  0.01%) ioPositionOfWindowSetxy (1) (100.0%)



**Memory**
old +574,744 bytes

**GCs**
full 0 totalling 0ms (0% elapsed time)
scavenges 330 totalling 93ms (1.477% elapsed time), avg 0.282ms
tenures 0
root table 0 overflows

**Compiled Code Compactions**
0 totalling 0ms (0% elapsed time)

**Events**
Process switches 66 (10 per second)
ioProcessEvents calls 314 (50 per second)
Interrupt checks 3462 (550 per second)
Event checks 3585 (569 per second)
Stack overflows 421897 (66989 per second)
Stack page divorces 0 (0 per second)

        

While this profile is flat, it does show where the VM is spending its time, and since the VM is not something with deep call chains, this view is the most useful I have.
The VMProfiler is part of the CogTools package at http://source.squeak.org/VMMaker.

P.S. Extending the size of the DefaultTabsArray will result in a less cramped report.

HTH
Eliot



--
best,
Eliot