Squeak and Pharo speed differences

Previous Topic Next Topic
classic Classic list List threaded Threaded
21 messages Options
Reply | Threaded
Open this post in threaded view

Re: [Pharo-dev] Squeak and Pharo speed differences


Another GUI-input speed observation:


If you click the scroll wheel once to shift the text contents of the pane, the visual effect is very fast.  I cannot detect any latency; it certainly is not close to what I see for text insertion/deletion, cursoring, and double-click selection.  The slower operations all involve getting a damage rect at a specific point based on cursor position or click position.  The faster scrolling function doesn’t need to do that.  It just grabs the entire visible rectangle minus one line, and blits it shifted down by one line, along with the one new line.  That looks like an almost zero-cost collection of damage rects.  It’s a simple, fast calculation.  Collection of damage rects for the slower operations looks much more expensive.  The events involved in both cases are delivered to the target handler with the same latency.  The slowness or quickness seems to have after that.





 Hence graphics output necessarily lags input on Morphic. So these speed differences have nothing to do with vm performance and everything to do with GUI architecture.


Both Squeak and Pharo show the same delay for text selection latency.   The architecture difference is not likely causing that. 


Given that both Pharo and Squeak useorphic and hence nothing have the same tender-in-step architecture isn’t the fact that they show the sane performance issue evidence that points to precisely this being the cause?


Yes, but not architecture, by which I think you mean the pushing of events versus the fixed-frequency regular loop in Morphic. I would expect a big variation in the Morphic case, but I don’t know what the fixed frequency is; it could well under the noise floor.   My first thought would be that getting the damage rects is the problem, but I’ve not seen the code.


 How do we index or look up the word rectangle to render?   I’m think that is more likely the cause.  Is a map created at method compile time and updated  after text is moved during edits?


My understanding is that damage rectangles are retrieved,


Right, but this is the potentially slow part—the retrieving or perhaps more specifically mapping a point to a rectangle containing a contiguous sequence of non-whitespace character (a word).


combined to produce a smaller (non-overlapping?) set, and that the entire morph tree is asked to render within these damage rectangles.  You can read the gods for yourself.


It’ll be a while.



I just tried some new experiments.  I should have thought of these earlier.  Character insertion and cursoring in any direction by one character have the same latency.  Collecting the damage rectangle at the cursor position and around the selected word are both taking about the same time as far as I can tell with my eye, and this time is longer than in VW or any Windows app.   But VW doesn’t use the Windows message queue directly.  All incoming Windows events are converted to Smalltalk equivalents and are queued on the Smalltalk side.  And it works well.   Why not mimic that pattern to get the extra speed?  Does something in Squeak/Pharo architecture prevent us from doing that?


How do we set a multi-process time profiler running so that we don’t need to eval blocks to get tallies.  I just want to use the editor and watch method hit distribution.  I see the Time profiler window; it seems to need a code snippet to work.



How is the JIT code cache cleared?


Dialect dependent.  In Squeak/Pharo/Cuis IIRC Smalltalk voidCogVMState.





 Can’t remember how it’s done in VW.


CompiledMethod allInstancesWeakly do: [:compiledMethod | compiledMethod flushCachedVMCode]



Baseline state:  the only thing that comes to mind here is Collect All Garbage.


There’s also Smalltalk garbageCollectMost which just runs a scavenge.  IIRC someInstance has a side effect of running a scavenge in VW.





and then ensure, through the relevant introspection primitives,


What are these?  What state features am I introspecting after the test?  Sizes of heap subspaces?  I can do Time microsecondsToRun: on the blocks.


In Squeak/Pharo/Cuis see Smalltalk vmParameterAt: or Smalltalk vm parameterAt: and senders.


Okay.  I see this list:


parameterAt: parameterIndex

                "parameterIndex is a positive integer corresponding to one of the VM's internal

                parameter/metric registers.  Answer with the current value of that register.

                Fail if parameterIndex has no corresponding register.

                VM parameters are numbered as follows:

                1              end (v3)/size(Spur) of old-space (0-based, read-only)

                2              end (v3)/size(Spur) of young/new-space (read-only)

                3              end (v3)/size(Spur) of heap (read-only)

                4              nil (was allocationCount (read-only))

                5              nil (was allocations between GCs (read-write)

                6              survivor count tenuring threshold (read-write)

                7              full GCs since startup (read-only)

                8              total milliseconds in full GCs since startup (read-only)

                9              incremental GCs (SqueakV3) or scavenges (Spur) since startup (read-only)

                10           total milliseconds in incremental GCs (SqueakV3) or scavenges (Spur) since startup (read-only)

                11           tenures of surving objects since startup (read-only)

                12-20 were specific to ikp's JITTER VM, now 12-19 are open for use

                20           utc microseconds at VM start-up (actually at time initialization, which precedes image load).

                21           root table size (read-only)

                22           root table overflows since startup (read-only)

                23           bytes of extra memory to reserve for VM buffers, plugins, etc (stored

                in image file header).

                24           memory threshold above which shrinking object memory (rw)

                25           memory headroom when growing object memory (rw)

                26           interruptChecksEveryNms - force an ioProcessEvents every N milliseconds             (rw) 27  number of times mark loop iterated for current IGC/FGC (read-only)              includes ALL marking

                28           number of times sweep loop iterated for current IGC/FGC (read-only)

                29           number of times make forward loop iterated for current IGC/FGC             (read-only) 30    number of times compact move loop iterated for current    IGC/FGC (read-only)

                31           number of grow memory requests (read-only)

                32           number of shrink memory requests (read-only)

                33           number of root table entries used for current IGC/FGC (read-only)

                34           number of allocations done before current IGC/FGC (read-only)

                35           number of survivor objects after current IGC/FGC (read-only)

                36           millisecond clock when current IGC/FGC completed (read-only)

                37           number of marked objects for Roots of the world, not including Root       Table entries for current IGC/FGC (read-only)

                38           milliseconds taken by current IGC (read-only)

                39           Number of finalization signals for Weak Objects pending when current   IGC/FGC completed (read-only)

                40           BytesPerOop for this image

                41           imageFormatVersion for the VM

                42           number of stack pages in use

                43           desired number of stack pages (stored in image file header, max 65535)

                44           size of eden, in bytes

                45           desired size of eden, in bytes (stored in image file header)

                46           machine code zone size, in bytes (Cog only; otherwise nil)

                47           desired machine code zone size (stored in image file header; Cog only;    otherwise nil)

                48           various header flags. See getCogVMFlags.

                49           max size the image promises to grow the external semaphore table to (0                sets to default, which is 256 as of writing)

                50-51 nil; reserved for VM parameters that persist in the image (such as eden above)

                52           root table capacity

                53           number of segments (Spur only; otherwise nil)

                54           total size of free old space (Spur only, otherwise nil)

                55           ratio of growth and image size at or above which a GC will be performed               post scavenge

                56           number of process switches since startup (read-only)

                57           number of ioProcessEvents calls since startup (read-only)

                58           number of ForceInterruptCheck calls since startup (read-only)

                59           number of check event calls since startup (read-only)

                60           number of stack page overflows since startup (read-only)

                61           number of stack page divorces since startup (read-only) 62           compiled code compactions since startup (read-only; Cog only; otherwise nil)

                63           total milliseconds in compiled code compactions since startup    (read-only; Cog only; otherwise nil)

                64           the number of methods that currently have jitted machine-code

                65           whether the VM supports a certain feature, MULTIPLE_BYTECODE_SETS is bit 0, IMMTABILITY is bit 1

                66           the byte size of a stack page

                67           the max allowed size of old space (Spur only; nil otherwise; 0 implies        no limit except that of the underlying platform)

                68           the average number of live stack pages when scanned by GC (at scavenge/gc/become et al)

                69           the maximum number of live stack pages when scanned by GC (at             scavenge/gc/become et al)

                70           the vmProxyMajorVersion (the interpreterProxy VM_MAJOR_VERSION)

                71           the vmProxyMinorVersion (the interpreterProxy VM_MINOR_VERSION)"