Spur Squeak Trunk Image Available

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Spur Squeak Trunk Image Available

Eliot Miranda-2
 
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda
Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

douglas mcpherson
 
Congratulations! This is /very/ exciting news for Squeak and Squeak-family Smalltalks. 

Doug

On Jun 12, 2014, at 16:41 , Eliot Miranda wrote:

Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda

Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Ryan Macnak
In reply to this post by Eliot Miranda-2
 
I see there are Newspeak Spur VMs. Are there Newspeak Spur boot images to with them? :)

On Thu, Jun 12, 2014 at 4:41 PM, Eliot Miranda <[hidden email]> wrote:
 
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda


Reply | Threaded
Open this post in threaded view
|

Re: Spur Squeak Trunk Image Available

Eliot Miranda-2
 


On Jun 12, 2014, at 9:22 PM, Ryan Macnak <[hidden email]> wrote:

I see there are Newspeak Spur VMs. Are there Newspeak Spur boot images to with them? :)

There are.  We should be ready to push the spur boot soon.  Give us a few days to put it through its paces.


On Thu, Jun 12, 2014 at 4:41 PM, Eliot Miranda <[hidden email]> wrote:
 
Hi All,

    it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.

Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak.

Features
The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code.  This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code.  It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.

The garbage collector has a scavenger and a global scan-mark-compact collector.  The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.

The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager.  The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC.  Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.

The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies.  While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC.  The existing memory manager does a full memory sweep and compact to implement become.  Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.

While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving.  Old space objects will not be moved if pinned.  Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space.  This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.

Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization.  Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.


Future Work
Spur is as yet a work in progress.  The 32-bit implementation is usable and appears stable.  The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time).  I hope to start on this soon.  But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits.  In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field).  I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.


Experience
I am of course interested in reports of performance effects.  Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone).  But my limited experience is that Spur is significantly faster than the existing VM.  Please post experiences, both positive and negative.

Finally, caveat emptor!  This is alpha code.  Bugs may result in image corruption.  If you do use Spur, please try and back up your work just in case.  And if anything does go wrong please let me know, preferrably providing a reproducible case.


Enjoy!
Eliot Miranda


Reply | Threaded
Open this post in threaded view
|

Re: [Pharo-dev] Spur Squeak Trunk Image Available

Eliot Miranda-2
In reply to this post by Eliot Miranda-2
 
Hi Philippe,


On Tue, Jun 17, 2014 at 1:03 AM, Philippe Marschall <[hidden email]> wrote:
On 13.06.14 01:41, Eliot Miranda wrote:




Hi All,

     it gives me great pleasure to let you know that a spur-format trunk
Squeak image is finally available at
http://www.mirandabanda.org/files/Cog/SpurImages/.  Spur VMs are
available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.



I'm seeing a Seaside request handling benchmark going from 10k req/s to 11k req/s.
Don't be too quick to dismiss this as being IO-bound (being IO-bound is actually quite hard on Squeak/Pharo). During the benchmark Squeak fully saturates one core. It is hard to tell what the limiting factor for this benchmark actually is. But removing one or two String allocations from the request handling loop usually yields about 100 to 200 additional req/s.

That's good news at least.  One thing you can try is the VMProfiler.  It's an interactive Morphic application so it will try and open and may crash Load the VMProfiler and use it via

    VMProfiler openInstance
        spyOn: [...];
        report: aStream.

e.g. 
VMProfiler openInstance
spyOn: [1 tinyBenchmarks];
report: (Transcript cr; yourself).
Transcript flush.

produces:

/Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak  6/17/2014 
eden size: 2,603,344  stack pages: 160  code size: 1,048,576

1 tinyBenchmarks

gc prior.  clear prior.  
6.298 seconds; sampling frequency 1473 hz
9231 samples in the VM (9275 samples in the entire program)  99.53% of total

8798 samples in generated vm code 95.31% of entire vm (94.86% of total)
433 samples in vanilla vm code   4.69% of entire vm (  4.67% of total)

% of generated vm code (% of total) (samples) (cumulative)
61.29%    (58.13%) Integer>>benchFib (5392) (61.29%)
13.47%    (12.78%) Integer>>benchmark (1185) (74.76%)
11.48%    (10.89%) Object>>at:put: (1010) (86.24%)
  9.60%    (  9.11%) SmallInteger>>+ (845) (95.84%)
  3.99%    (  3.78%) Object>>at: (351) (99.83%)
  0.07%    (  0.06%) ceBaseFram...Trampoline(6) (99.90%)
  0.03%    (  0.03%) ceMethodAbort0Args (3) (99.93%)
  0.03%    (  0.03%) Sequenceabl...om:to:put: (3) (99.97%)
  0.02%    (  0.02%) Magnitude>>min: (2) (99.99%)
  0.01%    (  0.01%) ceEnterCog...ceiverReg (1) (100.0%)


% of vanilla vm code (% of total) (samples) (cumulative)
32.33%    (  1.51%) primitiveStringReplace (140) (32.33%)
24.25%    (  1.13%) copyAndForward (105) (56.58%)
13.63%    (  0.64%) marryFrameSP (59) (70.21%)
10.85%    (  0.51%) instantiateClassindexableSize (47) (81.06%)
  3.46%    (  0.16%) ceBaseFrameReturn (15) (84.53%)
  2.08%    (  0.10%) scavengeLoop (9) (86.61%)
  1.62%    (  0.08%) checkForEvent...ontextSwitch (7) (88.22%)
  1.39%    (  0.06%) externalEnsureIsBaseFrame (6) (89.61%)
  1.39%    (  0.06%) voidVMStateForSn...nalPrimitivesIf(6) (90.99%)
  1.15%    (  0.05%) mapInterpreterOops (5) (92.15%)
  1.15%    (  0.05%) returnToExecuti...tContextSwitch(5) (93.30%)
  1.15%    (  0.05%) scavengeReferentsOf (5) (94.46%)
  0.92%    (  0.04%) handleStackOverflow (4) (95.38%)
  0.92%    (  0.04%) signed64BitIntegerFor (4) (96.30%)
  0.46%    (  0.02%) primitiveShortAt (2) (96.77%)
  0.46%    (  0.02%) ceNonLocalReturn (2) (97.23%)
  0.23%    (  0.01%) gMoveCwR (1) (97.46%)
  0.23%    (  0.01%) genInnerPrimit...CharacterinReg (1) (97.69%)
  0.23%    (  0.01%) genPrimitiveMultiply (1) (97.92%)
  0.23%    (  0.01%) primitiveInflateDecompressBlock(1) (98.15%)
  0.23%    (  0.01%) primitiveNewWithArg (1) (98.38%)
  0.23%    (  0.01%) primitiveTestAnd...fCriticalSection(1) (98.61%)
  0.23%    (  0.01%) processWeaklings (1) (98.85%)
  0.23%    (  0.01%) scavengingGCTenuringIf (1) (99.08%)
  0.23%    (  0.01%) updateMaybeObjRefAt (1) (99.31%)
  0.23%    (  0.01%) wakeHighestPriority (1) (99.54%)
  0.23%    (  0.01%) ceCannotResume (1) (99.77%)
  0.23%    (  0.01%) ioPositionOfWindowSetxy (1) (100.0%)



**Memory**
old +574,744 bytes

**GCs**
full 0 totalling 0ms (0% elapsed time)
scavenges 330 totalling 93ms (1.477% elapsed time), avg 0.282ms
tenures 0
root table 0 overflows

**Compiled Code Compactions**
0 totalling 0ms (0% elapsed time)

**Events**
Process switches 66 (10 per second)
ioProcessEvents calls 314 (50 per second)
Interrupt checks 3462 (550 per second)
Event checks 3585 (569 per second)
Stack overflows 421897 (66989 per second)
Stack page divorces 0 (0 per second)

        

While this profile is flat, it does show where the VM is spending its time, and since the VM is not something with deep call chains, this view is the most useful I have.
The VMProfiler is part of the CogTools package at http://source.squeak.org/VMMaker.

P.S. Extending the size of the DefaultTabsArray will result in a less cramped report.

HTH
Eliot