Hi All,
it gives me great pleasure to let you know that a spur-format trunk Squeak image is finally available at http://www.mirandabanda.org/files/Cog/SpurImages/. Spur VMs are available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/.
Spur is a new object representation and garbage collector for Squeak/Pharo/Croquet/Newspeak. Features The object representation is significantly simpler than the existing one, and hence permits a lot of JIT optimizations, in particular allocating objects in machine code. This speeds up new, new: et al, but also speeds up blocks because contexts and closures are now allocated in machine code. It also provides immediate characters, so for example accessing wide strings is much faster in Spur, since characters do not have to be instantiated to represent characters with codes greater than 255.
The garbage collector has a scavenger and a global scan-mark-compact collector. The scavenger is significantly faster than the existing pointer-reversal scan-mark-compact, hence GC performance is much improved.
The memory manager manages old space as a sequence of segments, as opposed to the single contiguous space provided by the existing memory manager. The memory manager grows the heap a segment at a time, and can and will release empty segments back to the host OS after a full GC. Hence Spur is able to grow the heap to the limit of available memory without one having to specify the VM's memory size at start-up.
The object representation uses "lazy forwarding" to implement become:, creating copies of objects that are becommed, and forwarding the existing objects to the copies. While Spur still scans the stack zone on become to ensure no forwarding pointers to the receiver exist in stack frames (for check-free push and store instance variable operations), it does not scan the entire heap, catching sends to forwarded objects as part of the normal message send class checks, hence following forwarding pointers lazily, and eliminating forwarders during GC. The existing memory manager does a full memory sweep and compact to implement become. Hence Spur provides the performance advantages of direct pointers while providing a significantly faster become.
While Spur uses moving GC (scavenging and compaction on full GC), just like the existing memory manager, Spur supports pinning, the ability to stop an object from moving. Old space objects will not be moved if pinned. Attempting to pin a new space object causes a become, forwarding the new space object to a pinned copy in old space. This allows simpler interfacing with foreign code through the FFI, since one can hand out references to pinned objects in the knowledge that they will not be moved by the GC.
Finally Spur supports ephemerons in a simple and direct way, providing pre-mortem per-instance finalization. Although the image-level support needs to be written, it should soon be possible to improve the finalization of entities such as buffered files (ensuring they are flushed before being GCed), etc.
Future Work Spur is as yet a work in progress. The 32-bit implementation is usable and appears stable. The major missing component is an incremental scan-mark GC that should eliminate long pauses due to the global scan-mark-compact GC (which is still invoked at snapshot time). I hope to start on this soon. But another key facet of Spur is that the object header format and the sizes of objects are common between 32- and 64-bits. In 32-bit and 64-bit Spur, object bodies are multiples of 8 bytes, so there may be an unused slot at the end of a 32-bit object with an odd number of slots. Hence Spur is close to providing a "true" 64-bit system, one with 61-bit SmallIntegers, and 61-bit SmallFloats (objects with the same precision, but less range that 64-bit Float, done by stealing bits from the exponent field). I look forward to collaborating with Esteban Lorenzano on 64-bit Spur and hope that it will be available early next year.
Experience I am of course interested in reports of performance effects. Under certain, hopefully rare circumstances, Spur may actually be slower (one is when the number of processes involved in process switching exceeds the number of stack pages in the stack zone). But my limited experience is that Spur is significantly faster than the existing VM. Please post experiences, both positive and negative.
Finally, caveat emptor! This is alpha code. Bugs may result in image corruption. If you do use Spur, please try and back up your work just in case. And if anything does go wrong please let me know, preferrably providing a reproducible case.
Enjoy! Eliot Miranda
|
Thanks eliot.
As soon as it is available for Pharo we will try with some large moose images. Stef On 13/6/14 01:41, Eliot Miranda wrote: > Hi All, > > it gives me great pleasure to let you know that a spur-format > trunk Squeak image is finally available at > http://www.mirandabanda.org/files/Cog/SpurImages/. Spur VMs are > available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/. > > Spur is a new object representation and garbage collector for > Squeak/Pharo/Croquet/Newspeak. > > Features > The object representation is significantly simpler than the existing > one, and hence permits a lot of JIT optimizations, in particular > allocating objects in machine code. This speeds up new, new: et al, > but also speeds up blocks because contexts and closures are now > allocated in machine code. It also provides immediate characters, so > for example accessing wide strings is much faster in Spur, since > characters do not have to be instantiated to represent characters with > codes greater than 255. > > The garbage collector has a scavenger and a global scan-mark-compact > collector. The scavenger is significantly faster than the existing > pointer-reversal scan-mark-compact, hence GC performance is much improved. > > The memory manager manages old space as a sequence of segments, as > opposed to the single contiguous space provided by the existing memory > manager. The memory manager grows the heap a segment at a time, and > can and will release empty segments back to the host OS after a full > GC. Hence Spur is able to grow the heap to the limit of available > memory without one having to specify the VM's memory size at start-up. > > The object representation uses "lazy forwarding" to implement become:, > creating copies of objects that are becommed, and forwarding the > existing objects to the copies. While Spur still scans the stack zone > on become to ensure no forwarding pointers to the receiver exist in > stack frames (for check-free push and store instance variable > operations), it does not scan the entire heap, catching sends to > forwarded objects as part of the normal message send class checks, > hence following forwarding pointers lazily, and eliminating forwarders > during GC. The existing memory manager does a full memory sweep and > compact to implement become. Hence Spur provides the performance > advantages of direct pointers while providing a significantly faster > become. > > While Spur uses moving GC (scavenging and compaction on full GC), just > like the existing memory manager, Spur supports pinning, the ability > to stop an object from moving. Old space objects will not be moved if > pinned. Attempting to pin a new space object causes a become, > forwarding the new space object to a pinned copy in old space. This > allows simpler interfacing with foreign code through the FFI, since > one can hand out references to pinned objects in the knowledge that > they will not be moved by the GC. > > Finally Spur supports ephemerons in a simple and direct way, providing > pre-mortem per-instance finalization. Although the image-level > support needs to be written, it should soon be possible to improve the > finalization of entities such as buffered files (ensuring they are > flushed before being GCed), etc. > > > Future Work > Spur is as yet a work in progress. The 32-bit implementation is > usable and appears stable. The major missing component is an > incremental scan-mark GC that should eliminate long pauses due to the > global scan-mark-compact GC (which is still invoked at snapshot time). > I hope to start on this soon. But another key facet of Spur is that > the object header format and the sizes of objects are common between > 32- and 64-bits. In 32-bit and 64-bit Spur, object bodies are > multiples of 8 bytes, so there may be an unused slot at the end of a > 32-bit object with an odd number of slots. Hence Spur is close to > providing a "true" 64-bit system, one with 61-bit SmallIntegers, and > 61-bit SmallFloats (objects with the same precision, but less range > that 64-bit Float, done by stealing bits from the exponent field). I > look forward to collaborating with Esteban Lorenzano on 64-bit Spur > and hope that it will be available early next year. > > > Experience > I am of course interested in reports of performance effects. Under > certain, hopefully rare circumstances, Spur may actually be slower > (one is when the number of processes involved in process switching > exceeds the number of stack pages in the stack zone). But my limited > experience is that Spur is significantly faster than the existing VM. > Please post experiences, both positive and negative. > > Finally, caveat emptor! This is alpha code. Bugs may result in image > corruption. If you do use Spur, please try and back up your work just > in case. And if anything does go wrong please let me know, > preferrably providing a reproducible case. > > > Enjoy! > Eliot Miranda |
On Thu, Jun 12, 2014 at 10:48 PM, stepharo <[hidden email]> wrote: Thanks eliot. OK, but just so you know I'm not going to do the Pharo bootstrap. I hope someone in your team will have a go. I'm happy to help but I don't know Pharo well enough to do this. It's a matter of making sure the Monticelo packages can be edited. The Squeak code may just work but I wouldn't know.
best, Eliot
|
On 13 Jun 2014, at 04:29, Eliot Miranda <[hidden email]> wrote:
yes, I’m on it :) Esteban
|
On 13 Jun 2014, at 06:44, Esteban Lorenzano <[hidden email]> wrote:
I’m already building the PharoVM with Spur :) but that’s just the beginning… I need to make it NB compatible first (a small step, I will finish today or tomorrow) and I need to convert the images… that will take more time (and I will need some help… cough, cough, Camille, cough, cough, Guille) :P Esteban
|
Hi Esteban,
On Fri, Jun 13, 2014 at 6:31 AM, Esteban Lorenzano <[hidden email]> wrote:
let me know if I can help... I'm curious whether the basic image bootstrap works out-of-the-box or not
(SpurBootstrap bootstrapImage: 'imageBasename'). Yesterday I wrote scripts to automate the Squeak process, and these were used to produce the Squeak trunk Spur image, see buildspurtrunkimage.sh
BuildSqueakTrunkImage.stWriteSpurPackagesToTempDir.st LoadSpurPackagesFromTempDir.st Most of this is scripting the Monticello package patch and load.
best, Eliot
|
In reply to this post by Eliot Miranda-2
On 13.06.14 01:41, Eliot Miranda wrote:
> > > > > Hi All, > > it gives me great pleasure to let you know that a spur-format trunk > Squeak image is finally available at > http://www.mirandabanda.org/files/Cog/SpurImages/. Spur VMs are > available at http://www.mirandabanda.org/files/Cog/VM/VM.r2987/. > > … I'm seeing a Seaside request handling benchmark going from 10k req/s to 11k req/s. Don't be too quick to dismiss this as being IO-bound (being IO-bound is actually quite hard on Squeak/Pharo). During the benchmark Squeak fully saturates one core. It is hard to tell what the limiting factor for this benchmark actually is. But removing one or two String allocations from the request handling loop usually yields about 100 to 200 additional req/s. Cheers Philippe |
Hi Philippe,
On Tue, Jun 17, 2014 at 1:03 AM, Philippe Marschall <[hidden email]> wrote:
That's good news at least. One thing you can try is the VMProfiler. It's an interactive Morphic application so it will try and open and may crash Load the VMProfiler and use it via
VMProfiler openInstance spyOn: [...]; report: aStream. e.g. VMProfiler openInstance spyOn: [1 tinyBenchmarks];
report: (Transcript cr; yourself). Transcript flush. produces: /Users/eliot/Cog/oscogvm/build.macos32x86/squeak.cog.spur/Fast.app/Contents/MacOS/Squeak 6/17/2014
eden size: 2,603,344 stack pages: 160 code size: 1,048,576 1 tinyBenchmarks gc prior. clear prior. 6.298 seconds; sampling frequency 1473 hz
9231 samples in the VM (9275 samples in the entire program) 99.53% of total 8798 samples in generated vm code 95.31% of entire vm (94.86% of total)
433 samples in vanilla vm code 4.69% of entire vm ( 4.67% of total) % of generated vm code (% of total) (samples) (cumulative)
61.29% (58.13%) Integer>>benchFib (5392) (61.29%)
13.47% (12.78%) Integer>>benchmark (1185) (74.76%)
11.48% (10.89%) Object>>at:put: (1010) (86.24%)
9.60% ( 9.11%) SmallInteger>>+ (845) (95.84%)
3.99% ( 3.78%) Object>>at: (351) (99.83%)
0.07% ( 0.06%) ceBaseFram...Trampoline(6) (99.90%) 0.03% ( 0.03%) ceMethodAbort0Args (3) (99.93%)
0.03% ( 0.03%) Sequenceabl...om:to:put: (3) (99.97%)
0.02% ( 0.02%) Magnitude>>min: (2) (99.99%)
0.01% ( 0.01%) ceEnterCog...ceiverReg (1) (100.0%)
% of vanilla vm code (% of total) (samples) (cumulative) 32.33% ( 1.51%) primitiveStringReplace (140) (32.33%)
24.25% ( 1.13%) copyAndForward (105) (56.58%)
13.63% ( 0.64%) marryFrameSP (59) (70.21%)
10.85% ( 0.51%) instantiateClassindexableSize (47) (81.06%)
3.46% ( 0.16%) ceBaseFrameReturn (15) (84.53%)
2.08% ( 0.10%) scavengeLoop (9) (86.61%)
1.62% ( 0.08%) checkForEvent...ontextSwitch (7) (88.22%)
1.39% ( 0.06%) externalEnsureIsBaseFrame (6) (89.61%)
1.39% ( 0.06%) voidVMStateForSn...nalPrimitivesIf(6) (90.99%) 1.15% ( 0.05%) mapInterpreterOops (5) (92.15%)
1.15% ( 0.05%) returnToExecuti...tContextSwitch(5) (93.30%) 1.15% ( 0.05%) scavengeReferentsOf (5) (94.46%)
0.92% ( 0.04%) handleStackOverflow (4) (95.38%)
0.92% ( 0.04%) signed64BitIntegerFor (4) (96.30%)
0.46% ( 0.02%) primitiveShortAt (2) (96.77%)
0.46% ( 0.02%) ceNonLocalReturn (2) (97.23%)
0.23% ( 0.01%) gMoveCwR (1) (97.46%)
0.23% ( 0.01%) genInnerPrimit...CharacterinReg (1) (97.69%)
0.23% ( 0.01%) genPrimitiveMultiply (1) (97.92%)
0.23% ( 0.01%) primitiveInflateDecompressBlock(1) (98.15%) 0.23% ( 0.01%) primitiveNewWithArg (1) (98.38%)
0.23% ( 0.01%) primitiveTestAnd...fCriticalSection(1) (98.61%) 0.23% ( 0.01%) processWeaklings (1) (98.85%)
0.23% ( 0.01%) scavengingGCTenuringIf (1) (99.08%)
0.23% ( 0.01%) updateMaybeObjRefAt (1) (99.31%)
0.23% ( 0.01%) wakeHighestPriority (1) (99.54%)
0.23% ( 0.01%) ceCannotResume (1) (99.77%)
0.23% ( 0.01%) ioPositionOfWindowSetxy (1) (100.0%)
**Memory** old +574,744 bytes **GCs** full 0 totalling 0ms (0% elapsed time) scavenges 330 totalling 93ms (1.477% elapsed time), avg 0.282ms
tenures 0 root table 0 overflows
**Compiled Code Compactions** 0 totalling 0ms (0% elapsed time) **Events** Process switches 66 (10 per second)
ioProcessEvents calls 314 (50 per second) Interrupt checks 3462 (550 per second)
Event checks 3585 (569 per second) Stack overflows 421897 (66989 per second)
Stack page divorces 0 (0 per second) While this profile is flat, it does show where the VM is spending its time, and since the VM is not something with deep call chains, this view is the most useful I have.
The VMProfiler is part of the CogTools package at http://source.squeak.org/VMMaker. P.S. Extending the size of the DefaultTabsArray will result in a less cramped report.
HTH Eliot |
On Tue, Jun 17, 2014 at 10:29 AM, Eliot Miranda <[hidden email]> wrote:
What I meant to say is that it may fail if the morph can't open. It doesn't routinely fail :-)
best, Eliot
|
Free forum by Nabble | Edit this page |