"Eliot Miranda" <[hidden email]> wrote in message
news:[hidden email]... > ... > Interesting! [or alternatively "Ouch!", ed] The VW oldSpace allocator > used to allocate tenured objects (which is what this "let's keep tons of > objects around" stresses in VW's case is poor w.r.t a classic blue-book > implementation because VW doesn't organize its oldSpace free lists as an > objectTableEntry (ote) holding onto an objectBody. Instead it keeps > separate lists of free otes and objectBodies. SO allocating an oldSpace > object requires unlinking an ote from one free list and an objectBody > from another. Further, the allocation code is not at all aggressively > inlined and involves at least three procedure calls. > > Blair, if you're comfortable discussing it, what oldSpace free list > organization does D5 use? > It too would have to allocate from two separate lists, one for header and one for body, but its not a classic generational collector so would perhaps do better on this unnatural test. Like I say I don't think its very relevant to normal application performance. What is interesting, however, is the speed of the Java collectors, which are presumably more similar in design to VWs? Regards Blair |
Blair McGlashan wrote:
> > "Eliot Miranda" <[hidden email]> wrote in message > news:[hidden email]... > > ... > > Interesting! [or alternatively "Ouch!", ed] The VW oldSpace allocator > > used to allocate tenured objects (which is what this "let's keep tons of > > objects around" stresses in VW's case is poor w.r.t a classic blue-book > > implementation because VW doesn't organize its oldSpace free lists as an > > objectTableEntry (ote) holding onto an objectBody. Instead it keeps > > separate lists of free otes and objectBodies. SO allocating an oldSpace > > object requires unlinking an ote from one free list and an objectBody > > from another. Further, the allocation code is not at all aggressively > > inlined and involves at least three procedure calls. > > > > Blair, if you're comfortable discussing it, what oldSpace free list > > organization does D5 use? > > > > It too would have to allocate from two separate lists, one for header and > one for body, but its not a classic generational collector so would perhaps > do better on this unnatural test. Like I say I don't think its very relevant > to normal application performance. What is interesting, however, is the > speed of the Java collectors, which are presumably more similar in design to > VWs? I don't think so. Aren't many of the commercial Java offerings based on the train algorithm? VW doesn't use the train algorithm. It has a straight-forward incremental collector, stop-the-world mark-sweep and a three generation system (scavenged newSpace, incrementally collected oldSpace and uncollected [except for stop-the-world collection] permSpace). The only thing exotic about the VW collector is ephemerons. -- _______________,,,^..^,,,____________________________ Eliot Miranda Smalltalk - Scene not herd |
Eliot Miranda wrote:
> Aren't many of the commercial Java offerings based > on the train algorithm? It's difficult to find out what the Java vendors use. IBM only seem to talk details about their research JVMs (and some it is pretty interesting). I haven't yet found any data about anyone else except Sun. Sun (of course the major player), manage to obfuscate what they're doing pretty well too. Since they seem to change the GC with every major release, but don't often update the documentation, it's difficult to work out what's going on. However, my best guess (and it *is* a guess, note) is that the current (J2SDK 1.4.1) JVM uses: 1) A perm space, perhaps; there was definitely one in 1.3, but there are hints, no more, that it's gone away in 1.4. 2) A long-lived object space. Collected by either mark-and-compact or by a parallelised equivalent ; the choice is configurable, defaulting to the non-parallel version. 3) An intermediate space that is either collected by mark-and-compact, or by a train algorithm. Again the choice is configurable, the default is to use mark-and-compact. The distinction between 2 and 3 is only mentioned once, mostly the doc leaves you with the impression that the train algorithm is used for all long-lived objects (if it's used at all). 4) A nursery consisting of an allocation area and a couple of alternating survivor spaces. Optionally (as of 1.4.1), there's a parallelised GC available for this space too. At one time, Sun claimed that their latest JVM (I think this was around 1.3) had improved allocation in multi-threaded apps because it now used a per-thread pool of some sort, but I don't see how that can be reconciled with their other documentation. The best links I can find are: http://java.sun.com/docs/hotspot/gc/index.html http://developer.java.sun.com/developer/technicalArticles/Networking/HotSpot/ http://developer.java.sun.com/developer/technicalArticles/Programming/turbo/ And there's some of their research stuff at: http://research.sun.com/jtech/pubs/ which I'm sure won't have much news for Eliot or Blair, but there's some pretty interesting stuff there for the rest of us (those of us who happen to be sad VM-junkies anyway ;-) -- chris |
In reply to this post by Blair McGlashan
On the current S#.AOS VM I tried the following code:
|kTimes := 1000000, oc, run| VM.gcMemory. run := { Time millisecondsToRun: [ oc := OrderedCollection new: kTimes. 1 to: kTimes do: [:each | oc add: Rectangle new] ]. Time millisecondsToRun: [1 to: kTimes do: [:each | (oc at: each) top]]. Time millisecondsToRun: [oc do: [:each | each top]]. Time millisecondsToRun: [ |s| := 1.00000001. 1 to: kTimes do: [:each | s := s * 1.00000001.]. ]. }. {run inject: 0 into: [:a:b| a+b], run}. My machine info is: OS Name Microsoft Windows XP Professional Version 5.1.2600 Service Pack 1 Build 2600 OS Manufacturer Microsoft Corporation System Name SATELLITE System Manufacturer TOSHIBA System Model Satellite 5105 System Type X86-based PC Processor x86 Family 15 Model 2 Stepping 4 GenuineIntel ~1694 Mhz BIOS Version/Date TOSHIBA Version 1.70, 4/8/2002 SMBIOS Version 2.3 Windows Directory C:\WINDOWS System Directory C:\WINDOWS\System32 Boot Device \Device\HarddiskVolume1 Locale United States Hardware Abstraction Layer Version = "5.1.2600.1106 (xpsp1.020828-1920)" User Name SATELLITE\David Simmons Time Zone Pacific Standard Time Total Physical Memory 1,024.00 MB Available Physical Memory 419.88 MB Total Virtual Memory 2.65 GB Available Virtual Memory 1.53 GB Page File Space 1.65 GB Page File C:\pagefile.sys Which yielded the following runs [all numbers in milliseconds]: == 1M CASES == ============== Times = 1M [actual 1.7GHz mobile cpu speed] RUN[0]: {1988, {1277, 37, 40, 634}} RUN[1]: {741, {147, 35, 39, 520}} RUN[2]: {354, {144, 35, 50, 125}} RUN[3]: {356, {141, 34, 39, 142}} Times = 1M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] RUN[0]: {1778, {1142, 33, 35, 567}} RUN[0]: {663, {131, 31, 34, 465}} RUN[0]: {316, {128, 31, 44, 111}} RUN[0]: {318, {126, 30, 34, 127}} == 3M CASES == ============== Times = 3M [actual 1.7GHz mobile cpu speed] RUN[0]: {5781, {3537, 106, 121, 2017}} RUN[0]: {1172, {438, 104, 120, 510}} RUN[0]: {952, {391, 117, 121, 323}} RUN[0]: {948, {401, 104, 120, 323}} Times = 3M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] RUN[0]: {5172, {3164, 94, 108, 1804}} RUN[0]: {1048, {391, 93, 107, 456}} RUN[0]: {851, {349, 104, 108, 289}} RUN[0]: {848, {358, 93, 107, 289}} ---------------------- Some things to note... ---------------------- o) The current AOS.VM build I ran this against does not have "ephemeral-gc" services enabled -- which negatively affects all constructor times -- the ephemeral gc, among other things, uses auto-inlined custom jitted #new methods for every class. This biggest performance impact of this would be on the [Double mul time] 4th value (within a given run/set), but it will also impacts the performance of the rectangle construction. Typically when the "ephermal-gc" is enabled it generates ephemeral objects 10-20 times faster (10 vs 20x depends largely on cpu cache characteristics). o) My machine has a lot of other stuff running [I'm building S#.NET with VS.NET open, sucking down Squeak 3.4, etc] which tends to cause the cited numbers to be probably some 10-15% higher [longer times] than they might otherwise be. o) I ran the tests in a heavily loaded environment browser. If this affected the tests it should have done so by hurting the performance numbers. What does this mean? ------------------- 1. In all likelyhood, from comparitive numbers I've run in the past, the Rectangle.new method would probably run 2X or better than the current version using a generic #new. 2. The 4th value (within a given run/set) is very likely to be consistently less than 100ms if the ephemeral gc was enabled. I would guess that with ephemeral-gc services enabled we would not see the "RUN[0]" numbers but would instead see behavior consistent with the "RUN[1]" and better cases. In all likelyhood the numbers would be somewhat better than the "RUN[4]" cases after the initial run (or some other gc sizing trigger) occurred. Are such benchmarks meaningful? ------------------------------ I suspect they are probably not generally useful for hard-comparison purposes because +/- some 50% could be accounted for by variances in images and object memory at the time/environment in which the tests were run. However, running such tests is useful for identifying performance issues. Most significantly for me, I found a policy bug in the VM regarding resizeable object growth. I revised the policy as a result of this benchmark. Which, as is often the case, turned out to be the most important/useful aspect for me. -- Dave S. [www.smallscript.org] "Blair McGlashan" <[hidden email]> wrote in message news:b3frad$1m4hep$[hidden email]... > "Jochen Riekhof" <[hidden email]> wrote in message > news:[hidden email]... > > > > If that's the case then I'd expect a difference of less than > > > > an order of magnitude between JITed Java and Dolphin > > > > for *integer* arithmetic and integer arrays. The difference > > > > is huge for floating point, though. (Presumably because of Smalltalk's > > > > "boxed" floats.) If you are seeing a 20-to-1 difference then I'd > guess > > > > that nearly all of it is down to floating-point performance. > > > > > > This may well be. > > > > No, may not be :-). > > > > I did a VERY quick check on simple operations performance and here is the > > result. I measured both server and client vm in hotspot and interpreted > mode > > vs. Dolphin. All code is appended. That the sever VM seems to be slow is > > that it does not have enough time to "wwarm up". It does a lot of > background > > analysis and compilation that never pays off because the app runs only a > few > > seconds at all. You can expect the server vm to be faster than the client > vm > > after a few minutes. > > > > There is not a BIG difference in interpreted mode vs. Dolphin, except > > iterators are half the speed of a do: operation. > > Hotspot any version is much faster, however. Float is only about factor > two > > slower in dolphin as opposed to your 1 to 20 guess. > > > > The most striking difference came from the memory management. Dolphin > needed > > more than 30 seconds to allocated the one million Rectangle objects. > > close of the workspace the env. freezed foar about 45 seconds for gc (I > > guess). > >... > > I was pretty surprised by this, so I thought I'd look to see why. Just > looking at the script, something that was immediately apparent is that your > allocation test is actually allocating 3 million objects on Dolphin, vs 1 > million on Java. This is because Smalltalk Rectangles are actually > implemented as a pair of Point objects, whereas Java's is a single block of > memory holding 4 integer values. Since this is a micro-benchmark designed to > measure object allocation speed, I think it really ought to try and measure > the same number of allocations. Note though that on VW, Rectangle class>>new > answers an uninitialized Rectangle, so it is only performing 1 million > allocations, at least if we ignore the allocations needed to grow the > OrderedCollection. I noticed this when trying to run your benchmark on VW, > as it failed on the second expression when attempting to access #top of the > first Rectangle. Another point to note is that this isn't a particularly > pure test of allocation speed, as Smalltalk has to send a few messages to > initialize a Rectangle. > > Anyway, regardless of this, I tried out the following slight modification of > your script on the 2.2Ghz P4 Xeon with 512Mb I happened to be using: > > start := Time millisecondClockValue. > Transcript display: 'Alloc time: '; print: (Time millisecondsToRun: [ > oc := OrderedCollection new. > "Use #origin:corner: so will also run on VW and - note this actually > allocates 3 million objects" > 1 to: 1000000 do: [:each | oc add: (Rectangle origin: 0@0 corner: 0@0)]]); > cr. > Transcript display: 'Get (index) time: '; print: (Time millisecondsToRun: > to: 1000000 do: [:each | (oc at: each) top]]); cr. > Transcript display: 'Iterate (do) time: '; print: (Time millisecondsToRun: > [oc do: [:each | each top]]); cr. > Transcript display: 'Double mul time: '; print: (Time millisecondsToRun: [s > := 1.00000001. 1 to: 1000000 do: [:each | s := s * 1.00000001.]]); cr. > Transcript display: 'GC time: '; print: (Time millisecondsToRun: [oc := s := > nil. MemoryManager current collectGarbage "or ObjectMemory quickGC on VW"]); > cr. > Transcript display: 'Overall runtime: '; print: (Time > millisecondClockValue - start); cr > > These are the times I got from Dolphin 6 for the first and second runs, > times in milliseconds: > > Alloc time: 4116 > Get (index) time: 422 > Iterate (do) time: 289 > Double mul time: 116 > GC time: 204 > Overall runtime: 5159 > > Alloc time: 1408 > Get (index) time: 418 > Iterate (do) time: 290 > Double mul time: 105 > GC time: 211 > Overall runtime: 2441 > > Running it a number of times, the figures varied a bit, but I haven't > bothered to average them. > > As you can see the first run allocation time was significantly better than > your experience, I didn't know your machine spec but assumed that it must > similar since the second run results are similar. I also didn't see any > extended GC time, even if I replaced the #collectGarbage with a #compact, > though doing that did mean that the subsequent run figures were not much > faster than the first on the allocation test. Anyway, I thought this must be > something massively improved in D6 vs D5 (though I can't for the life of me > think what :-)), so I went back to D5 and got these results: > > Alloc time: 52363 > Get (index) time: 411 > Iterate (do) time: 284 > Double mul time: 123 > GC time: 190 > Overall runtime: 53375 > > Alloc time: 1275 > Get (index) time: 414 > Iterate (do) time: 288 > Double mul time: 112 > GC time: 186 > Overall runtime: 2285 > > I was happy that this coincided with your experience on the initial > allocation behaviour (though not that D6 was 100mS slower on the > run, even though this is probably just timing variability). > > I was still mystified as to the delay you experienced closing the workspace, > since this didn't seem to be born out by the forced GC timings (and if you > insert a 'Rectangle primAllInstances size' at the end of the script, you'll > see that those Rectangles really have been collected). So I thought I'd try > out doing as you did, and simply closing the workspace leaving the variables > to be collected at idle time. To my surprise I experienced exactly the same > lengthy freeze. I didn't measure its duration, but it was lengthy. I found > that if I nilled out the workspace variables before closing the workspace, > that the delay did not occur, so I could only conclude that there is > something very odd going on in the interaction between the view closing and > activities of the garbage collector. Obviously this needs to be > investigated, but I don't think it is a fundamental performance problem in > the Dolphin collector, as otherwise my other tests would also have shown > that. > > As a point of reference I tried running the script on VWNC7. I had to change > the Transcript #display: messages to #show:, and use "ObjectMemory quickGC" > in place of "MemoryManager current collectGarbage" (it seemed the nearest > equivalent), and this is what I got. > > Alloc time: 40849 > Get (index) time: 77 > Iterate (do) time: 51 > Double mul time: 327 > GC time: 116 > Overall runtime: 41445 > > [Subsequent runs were similar] > > As you can see, performance on the initial allocation test was poor. I > this is because I either have insufficient memory to run the test in VW, or > (more likely) the default memory policy/configuration is not appropriate for > this test. Certainly there was an awful lot of flashing up of the GC and > dustbin cursors when the test was running. So anyway, I don't think it is > really a valid result, and I also think the FP mul figure is questionable > since once again this was probably over influenced by GC activity: > > Anyway Jochen, I believe what has brought us to this point was your > statement: " Performance is about factor twenty > lower than Java HotSpot VM, ..." On this test at least, that would appear be > FUD, right? :-) > > [Frankly, though, I think you really need some more "macro" benchmarks, i.e. > closer to an actual application, to draw any real performance conclusions] > > Regards > > Blair > > |
Dag...
The: > RUN[0]: ... > RUN[0]: ... > RUN[0]: ... > RUN[0]: ... forms, should be read as: > RUN[1]: ... > RUN[2]: ... > RUN[3]: ... > RUN[4]: ... It was my goof. I got interrupted in the midst of writing this to attend to my lamb-roast dinner in the oven and forgot to come back and annotate them properly. sigh. -- Dave S. [www.smallscript.org] "David Simmons" <[hidden email]> wrote in message news:[hidden email]... > On the current S#.AOS VM I tried the following code: > > |kTimes := 1000000, oc, run| > VM.gcMemory. > run := { > Time millisecondsToRun: [ > oc := OrderedCollection new: kTimes. > 1 to: kTimes do: [:each | oc add: Rectangle new] > ]. > Time millisecondsToRun: [1 to: kTimes do: [:each | > (oc at: each) top]]. > Time millisecondsToRun: [oc do: [:each | each top]]. > Time millisecondsToRun: [ > |s| := 1.00000001. > 1 to: kTimes do: [:each | s := s * 1.00000001.]. > ]. > }. > {run inject: 0 into: [:a:b| a+b], run}. > > My machine info is: > > OS Name Microsoft Windows XP Professional > Version 5.1.2600 Service Pack 1 Build 2600 > OS Manufacturer Microsoft Corporation > System Name SATELLITE > System Manufacturer TOSHIBA > System Model Satellite 5105 > System Type X86-based PC > Processor x86 Family 15 Model 2 Stepping 4 GenuineIntel ~1694 Mhz > BIOS Version/Date TOSHIBA Version 1.70, 4/8/2002 > SMBIOS Version 2.3 > Windows Directory C:\WINDOWS > System Directory C:\WINDOWS\System32 > Boot Device \Device\HarddiskVolume1 > Locale United States > Hardware Abstraction Layer Version = "5.1.2600.1106 > User Name SATELLITE\David Simmons > Time Zone Pacific Standard Time > Total Physical Memory 1,024.00 MB > Available Physical Memory 419.88 MB > Total Virtual Memory 2.65 GB > Available Virtual Memory 1.53 GB > Page File Space 1.65 GB > Page File C:\pagefile.sys > > Which yielded the following runs [all numbers in milliseconds]: > > == 1M CASES == > ============== > Times = 1M [actual 1.7GHz mobile cpu speed] > RUN[0]: {1988, {1277, 37, 40, 634}} > RUN[1]: {741, {147, 35, 39, 520}} > RUN[2]: {354, {144, 35, 50, 125}} > RUN[3]: {356, {141, 34, 39, 142}} > > Times = 1M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] > RUN[0]: {1778, {1142, 33, 35, 567}} > RUN[0]: {663, {131, 31, 34, 465}} > RUN[0]: {316, {128, 31, 44, 111}} > RUN[0]: {318, {126, 30, 34, 127}} > > == 3M CASES == > ============== > Times = 3M [actual 1.7GHz mobile cpu speed] > RUN[0]: {5781, {3537, 106, 121, 2017}} > RUN[0]: {1172, {438, 104, 120, 510}} > RUN[0]: {952, {391, 117, 121, 323}} > RUN[0]: {948, {401, 104, 120, 323}} > > Times = 3M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] > RUN[0]: {5172, {3164, 94, 108, 1804}} > RUN[0]: {1048, {391, 93, 107, 456}} > RUN[0]: {851, {349, 104, 108, 289}} > RUN[0]: {848, {358, 93, 107, 289}} > > ---------------------- > Some things to note... > ---------------------- > > o) The current AOS.VM build I ran this against does not have > services enabled -- which negatively affects all constructor times -- the > ephemeral gc, among other things, uses auto-inlined custom jitted #new > methods for every class. > > This biggest performance impact of this would be on the [Double mul time] > 4th value (within a given run/set), but it will also impacts the performance > of the rectangle construction. > > Typically when the "ephermal-gc" is enabled it generates ephemeral objects > 10-20 times faster (10 vs 20x depends largely on cpu cache characteristics). > > o) My machine has a lot of other stuff running [I'm building S#.NET with > VS.NET open, sucking down Squeak 3.4, etc] which tends to cause the cited > numbers > to be probably some 10-15% higher [longer times] than they might otherwise > be. > > o) I ran the tests in a heavily loaded environment browser. If this affected > the tests it should have done so by hurting the performance numbers. > > What does this mean? > ------------------- > > 1. In all likelyhood, from comparitive numbers I've run in the past, the > Rectangle.new method would probably run 2X or better than the current > version using a generic #new. > > 2. The 4th value (within a given run/set) is very likely to be > less than 100ms if the ephemeral gc was enabled. > > I would guess that with ephemeral-gc services enabled we would not see the > "RUN[0]" numbers but would instead see behavior consistent with the "RUN[1]" > and better cases. In all likelyhood the numbers would be somewhat better > than the "RUN[4]" cases after the initial run (or some other gc sizing > trigger) occurred. > > Are such benchmarks meaningful? > ------------------------------ > > I suspect they are probably not generally useful for hard-comparison > purposes because +/- some 50% could be accounted for by variances in images > and object memory at the time/environment in which the tests were run. > > However, running such tests is useful for identifying performance issues. > Most significantly for me, I found a policy bug in the VM regarding > resizeable object growth. I revised the policy as a result of this > benchmark. Which, as > is often the case, turned out to be the most important/useful aspect for me. > > -- Dave S. [www.smallscript.org] > > "Blair McGlashan" <[hidden email]> wrote in message > news:b3frad$1m4hep$[hidden email]... > > "Jochen Riekhof" <[hidden email]> wrote in message > > news:[hidden email]... > > > > > If that's the case then I'd expect a difference of less than > > > > > an order of magnitude between JITed Java and Dolphin > > > > > for *integer* arithmetic and integer arrays. The difference > > > > > is huge for floating point, though. (Presumably because of > Smalltalk's > > > > > "boxed" floats.) If you are seeing a 20-to-1 difference then I'd > > guess > > > > > that nearly all of it is down to floating-point performance. > > > > > > > > This may well be. > > > > > > No, may not be :-). > > > > > > I did a VERY quick check on simple operations performance and here is > the > > > result. I measured both server and client vm in hotspot and > > mode > > > vs. Dolphin. All code is appended. That the sever VM seems to be slow is > > > that it does not have enough time to "wwarm up". It does a lot of > > background > > > analysis and compilation that never pays off because the app runs only a > > few > > > seconds at all. You can expect the server vm to be faster than the > client > > vm > > > after a few minutes. > > > > > > There is not a BIG difference in interpreted mode vs. Dolphin, except > > > iterators are half the speed of a do: operation. > > > Hotspot any version is much faster, however. Float is only about factor > > two > > > slower in dolphin as opposed to your 1 to 20 guess. > > > > > > The most striking difference came from the memory management. Dolphin > > needed > > > more than 30 seconds to allocated the one million Rectangle objects. > After > > > close of the workspace the env. freezed foar about 45 seconds for gc (I > > > guess). > > >... > > > > I was pretty surprised by this, so I thought I'd look to see why. Just > > looking at the script, something that was immediately apparent is that > your > > allocation test is actually allocating 3 million objects on Dolphin, vs 1 > > million on Java. This is because Smalltalk Rectangles are actually > > implemented as a pair of Point objects, whereas Java's is a single block > of > > memory holding 4 integer values. Since this is a micro-benchmark designed > to > > measure object allocation speed, I think it really ought to try and > measure > > the same number of allocations. Note though that on VW, Rectangle > class>>new > > answers an uninitialized Rectangle, so it is only performing 1 million > > allocations, at least if we ignore the allocations needed to grow the > > OrderedCollection. I noticed this when trying to run your benchmark on VW, > > as it failed on the second expression when attempting to access #top of > the > > first Rectangle. Another point to note is that this isn't a particularly > > pure test of allocation speed, as Smalltalk has to send a few messages to > > initialize a Rectangle. > > > > Anyway, regardless of this, I tried out the following slight modification > of > > your script on the 2.2Ghz P4 Xeon with 512Mb I happened to be using: > > > > start := Time millisecondClockValue. > > Transcript display: 'Alloc time: '; print: (Time millisecondsToRun: [ > > oc := OrderedCollection new. > > "Use #origin:corner: so will also run on VW and - note this actually > > allocates 3 million objects" > > 1 to: 1000000 do: [:each | oc add: (Rectangle origin: 0@0 corner: 0@0)]]); > > cr. > > Transcript display: 'Get (index) time: '; print: (Time millisecondsToRun: > [1 > > to: 1000000 do: [:each | (oc at: each) top]]); cr. > > Transcript display: 'Iterate (do) time: '; print: (Time millisecondsToRun: > > [oc do: [:each | each top]]); cr. > > Transcript display: 'Double mul time: '; print: (Time millisecondsToRun: > [s > > := 1.00000001. 1 to: 1000000 do: [:each | s := s * 1.00000001.]]); cr. > > Transcript display: 'GC time: '; print: (Time millisecondsToRun: [oc := s > := > > nil. MemoryManager current collectGarbage "or ObjectMemory quickGC on > VW"]); > > cr. > > Transcript display: 'Overall runtime: '; print: (Time > > millisecondClockValue - start); cr > > > > These are the times I got from Dolphin 6 for the first and second runs, > > times in milliseconds: > > > > Alloc time: 4116 > > Get (index) time: 422 > > Iterate (do) time: 289 > > Double mul time: 116 > > GC time: 204 > > Overall runtime: 5159 > > > > Alloc time: 1408 > > Get (index) time: 418 > > Iterate (do) time: 290 > > Double mul time: 105 > > GC time: 211 > > Overall runtime: 2441 > > > > Running it a number of times, the figures varied a bit, but I haven't > > bothered to average them. > > > > As you can see the first run allocation time was significantly better > > your experience, I didn't know your machine spec but assumed that it must > be > > similar since the second run results are similar. I also didn't see any > > extended GC time, even if I replaced the #collectGarbage with a #compact, > > though doing that did mean that the subsequent run figures were not much > > faster than the first on the allocation test. Anyway, I thought this must > be > > something massively improved in D6 vs D5 (though I can't for the life of > me > > think what :-)), so I went back to D5 and got these results: > > > > Alloc time: 52363 > > Get (index) time: 411 > > Iterate (do) time: 284 > > Double mul time: 123 > > GC time: 190 > > Overall runtime: 53375 > > > > Alloc time: 1275 > > Get (index) time: 414 > > Iterate (do) time: 288 > > Double mul time: 112 > > GC time: 186 > > Overall runtime: 2285 > > > > I was happy that this coincided with your experience on the initial > > allocation behaviour (though not that D6 was 100mS slower on the > subsequent > > run, even though this is probably just timing variability). > > > > I was still mystified as to the delay you experienced closing the > workspace, > > since this didn't seem to be born out by the forced GC timings (and if > > insert a 'Rectangle primAllInstances size' at the end of the script, > you'll > > see that those Rectangles really have been collected). So I thought I'd > try > > out doing as you did, and simply closing the workspace leaving the > variables > > to be collected at idle time. To my surprise I experienced exactly the > same > > lengthy freeze. I didn't measure its duration, but it was lengthy. I found > > that if I nilled out the workspace variables before closing the workspace, > > that the delay did not occur, so I could only conclude that there is > > something very odd going on in the interaction between the view closing > and > > activities of the garbage collector. Obviously this needs to be > > investigated, but I don't think it is a fundamental performance problem in > > the Dolphin collector, as otherwise my other tests would also have shown > > that. > > > > As a point of reference I tried running the script on VWNC7. I had to > change > > the Transcript #display: messages to #show:, and use "ObjectMemory > quickGC" > > in place of "MemoryManager current collectGarbage" (it seemed the nearest > > equivalent), and this is what I got. > > > > Alloc time: 40849 > > Get (index) time: 77 > > Iterate (do) time: 51 > > Double mul time: 327 > > GC time: 116 > > Overall runtime: 41445 > > > > [Subsequent runs were similar] > > > > As you can see, performance on the initial allocation test was poor. I > think > > this is because I either have insufficient memory to run the test in VW, > or > > (more likely) the default memory policy/configuration is not appropriate > for > > this test. Certainly there was an awful lot of flashing up of the GC and > > dustbin cursors when the test was running. So anyway, I don't think it > > really a valid result, and I also think the FP mul figure is questionable > > since once again this was probably over influenced by GC activity: > > > > Anyway Jochen, I believe what has brought us to this point was your > > statement: " Performance is about factor twenty > > lower than Java HotSpot VM, ..." On this test at least, that would appear > be > > FUD, right? :-) > > > > [Frankly, though, I think you really need some more "macro" benchmarks, > i.e. > > closer to an actual application, to draw any real performance conclusions] > > > > Regards > > > > Blair > > > > > > > > > |
In reply to this post by David Simmons-2
FYI,
On a slightly less loaded machine test [just VS.NET and Outlook] using the same large image, yielded: == 1M CASES == ============== Times = 1M [actual 1.7GHz mobile cpu speed] RUN[0]: {1735, {996, 34, 40, 665}} RUN[1]: {719, {123, 34, 40, 522}} RUN[2]: {301, {132, 34, 39, 96}} Times = 1M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] RUN[0]: {1552, {891, 30, 35, 595}} RUN[1]: {643, {110, 30, 35, 467}} RUN[2]: {269, {118, 30, 34, 85}} ====================== ------ Errata (previous post) ------ RUN[0]: ... RUN[0]: ... RUN[0]: ... RUN[0]: ... forms, should be read as: RUN[0]: ... RUN[1]: ... RUN[2]: ... RUN[3]: ... -- Dave S. "David Simmons" <[hidden email]> wrote in message news:[hidden email]... > On the current S#.AOS VM I tried the following code: > > |kTimes := 1000000, oc, run| > VM.gcMemory. > run := { > Time millisecondsToRun: [ > oc := OrderedCollection new: kTimes. > 1 to: kTimes do: [:each | oc add: Rectangle new] > ]. > Time millisecondsToRun: [1 to: kTimes do: [:each | > (oc at: each) top]]. > Time millisecondsToRun: [oc do: [:each | each top]]. > Time millisecondsToRun: [ > |s| := 1.00000001. > 1 to: kTimes do: [:each | s := s * 1.00000001.]. > ]. > }. > {run inject: 0 into: [:a:b| a+b], run}. > > My machine info is: > > OS Name Microsoft Windows XP Professional > Version 5.1.2600 Service Pack 1 Build 2600 > OS Manufacturer Microsoft Corporation > System Name SATELLITE > System Manufacturer TOSHIBA > System Model Satellite 5105 > System Type X86-based PC > Processor x86 Family 15 Model 2 Stepping 4 GenuineIntel ~1694 Mhz > BIOS Version/Date TOSHIBA Version 1.70, 4/8/2002 > SMBIOS Version 2.3 > Windows Directory C:\WINDOWS > System Directory C:\WINDOWS\System32 > Boot Device \Device\HarddiskVolume1 > Locale United States > Hardware Abstraction Layer Version = "5.1.2600.1106 > User Name SATELLITE\David Simmons > Time Zone Pacific Standard Time > Total Physical Memory 1,024.00 MB > Available Physical Memory 419.88 MB > Total Virtual Memory 2.65 GB > Available Virtual Memory 1.53 GB > Page File Space 1.65 GB > Page File C:\pagefile.sys > > Which yielded the following runs [all numbers in milliseconds]: > > == 1M CASES == > ============== > Times = 1M [actual 1.7GHz mobile cpu speed] > RUN[0]: {1988, {1277, 37, 40, 634}} > RUN[1]: {741, {147, 35, 39, 520}} > RUN[2]: {354, {144, 35, 50, 125}} > RUN[3]: {356, {141, 34, 39, 142}} > > Times = 1M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] > RUN[0]: {1778, {1142, 33, 35, 567}} > RUN[0]: {663, {131, 31, 34, 465}} > RUN[0]: {316, {128, 31, 44, 111}} > RUN[0]: {318, {126, 30, 34, 127}} > > == 3M CASES == > ============== > Times = 3M [actual 1.7GHz mobile cpu speed] > RUN[0]: {5781, {3537, 106, 121, 2017}} > RUN[0]: {1172, {438, 104, 120, 510}} > RUN[0]: {952, {391, 117, 121, 323}} > RUN[0]: {948, {401, 104, 120, 323}} > > Times = 3M [1.7GHz mobile cpu speed scaled to 1.9MHz equiv] > RUN[0]: {5172, {3164, 94, 108, 1804}} > RUN[0]: {1048, {391, 93, 107, 456}} > RUN[0]: {851, {349, 104, 108, 289}} > RUN[0]: {848, {358, 93, 107, 289}} > > ---------------------- > Some things to note... > ---------------------- > > o) The current AOS.VM build I ran this against does not have > services enabled -- which negatively affects all constructor times -- the > ephemeral gc, among other things, uses auto-inlined custom jitted #new > methods for every class. > > This biggest performance impact of this would be on the [Double mul time] > 4th value (within a given run/set), but it will also impacts the performance > of the rectangle construction. > > Typically when the "ephermal-gc" is enabled it generates ephemeral objects > 10-20 times faster (10 vs 20x depends largely on cpu cache characteristics). > > o) My machine has a lot of other stuff running [I'm building S#.NET with > VS.NET open, sucking down Squeak 3.4, etc] which tends to cause the cited > numbers > to be probably some 10-15% higher [longer times] than they might otherwise > be. > > o) I ran the tests in a heavily loaded environment browser. If this affected > the tests it should have done so by hurting the performance numbers. > > What does this mean? > ------------------- > > 1. In all likelyhood, from comparitive numbers I've run in the past, the > Rectangle.new method would probably run 2X or better than the current > version using a generic #new. > > 2. The 4th value (within a given run/set) is very likely to be > less than 100ms if the ephemeral gc was enabled. > > I would guess that with ephemeral-gc services enabled we would not see the > "RUN[0]" numbers but would instead see behavior consistent with the "RUN[1]" > and better cases. In all likelyhood the numbers would be somewhat better > than the "RUN[4]" cases after the initial run (or some other gc sizing > trigger) occurred. > > Are such benchmarks meaningful? > ------------------------------ > > I suspect they are probably not generally useful for hard-comparison > purposes because +/- some 50% could be accounted for by variances in images > and object memory at the time/environment in which the tests were run. > > However, running such tests is useful for identifying performance issues. > Most significantly for me, I found a policy bug in the VM regarding > resizeable object growth. I revised the policy as a result of this > benchmark. Which, as > is often the case, turned out to be the most important/useful aspect for me. > > -- Dave S. [www.smallscript.org] > > "Blair McGlashan" <[hidden email]> wrote in message > news:b3frad$1m4hep$[hidden email]... > > "Jochen Riekhof" <[hidden email]> wrote in message > > news:[hidden email]... > > > > > If that's the case then I'd expect a difference of less than > > > > > an order of magnitude between JITed Java and Dolphin > > > > > for *integer* arithmetic and integer arrays. The difference > > > > > is huge for floating point, though. (Presumably because of > Smalltalk's > > > > > "boxed" floats.) If you are seeing a 20-to-1 difference then I'd > > guess > > > > > that nearly all of it is down to floating-point performance. > > > > > > > > This may well be. > > > > > > No, may not be :-). > > > > > > I did a VERY quick check on simple operations performance and here is > the > > > result. I measured both server and client vm in hotspot and > > mode > > > vs. Dolphin. All code is appended. That the sever VM seems to be slow is > > > that it does not have enough time to "wwarm up". It does a lot of > > background > > > analysis and compilation that never pays off because the app runs only a > > few > > > seconds at all. You can expect the server vm to be faster than the > client > > vm > > > after a few minutes. > > > > > > There is not a BIG difference in interpreted mode vs. Dolphin, except > > > iterators are half the speed of a do: operation. > > > Hotspot any version is much faster, however. Float is only about factor > > two > > > slower in dolphin as opposed to your 1 to 20 guess. > > > > > > The most striking difference came from the memory management. Dolphin > > needed > > > more than 30 seconds to allocated the one million Rectangle objects. > After > > > close of the workspace the env. freezed foar about 45 seconds for gc (I > > > guess). > > >... > > > > I was pretty surprised by this, so I thought I'd look to see why. Just > > looking at the script, something that was immediately apparent is that > your > > allocation test is actually allocating 3 million objects on Dolphin, vs 1 > > million on Java. This is because Smalltalk Rectangles are actually > > implemented as a pair of Point objects, whereas Java's is a single block > of > > memory holding 4 integer values. Since this is a micro-benchmark designed > to > > measure object allocation speed, I think it really ought to try and > measure > > the same number of allocations. Note though that on VW, Rectangle > class>>new > > answers an uninitialized Rectangle, so it is only performing 1 million > > allocations, at least if we ignore the allocations needed to grow the > > OrderedCollection. I noticed this when trying to run your benchmark on VW, > > as it failed on the second expression when attempting to access #top of > the > > first Rectangle. Another point to note is that this isn't a particularly > > pure test of allocation speed, as Smalltalk has to send a few messages to > > initialize a Rectangle. > > > > Anyway, regardless of this, I tried out the following slight modification > of > > your script on the 2.2Ghz P4 Xeon with 512Mb I happened to be using: > > > > start := Time millisecondClockValue. > > Transcript display: 'Alloc time: '; print: (Time millisecondsToRun: [ > > oc := OrderedCollection new. > > "Use #origin:corner: so will also run on VW and - note this actually > > allocates 3 million objects" > > 1 to: 1000000 do: [:each | oc add: (Rectangle origin: 0@0 corner: 0@0)]]); > > cr. > > Transcript display: 'Get (index) time: '; print: (Time millisecondsToRun: > [1 > > to: 1000000 do: [:each | (oc at: each) top]]); cr. > > Transcript display: 'Iterate (do) time: '; print: (Time millisecondsToRun: > > [oc do: [:each | each top]]); cr. > > Transcript display: 'Double mul time: '; print: (Time millisecondsToRun: > [s > > := 1.00000001. 1 to: 1000000 do: [:each | s := s * 1.00000001.]]); cr. > > Transcript display: 'GC time: '; print: (Time millisecondsToRun: [oc := s > := > > nil. MemoryManager current collectGarbage "or ObjectMemory quickGC on > VW"]); > > cr. > > Transcript display: 'Overall runtime: '; print: (Time > > millisecondClockValue - start); cr > > > > These are the times I got from Dolphin 6 for the first and second runs, > > times in milliseconds: > > > > Alloc time: 4116 > > Get (index) time: 422 > > Iterate (do) time: 289 > > Double mul time: 116 > > GC time: 204 > > Overall runtime: 5159 > > > > Alloc time: 1408 > > Get (index) time: 418 > > Iterate (do) time: 290 > > Double mul time: 105 > > GC time: 211 > > Overall runtime: 2441 > > > > Running it a number of times, the figures varied a bit, but I haven't > > bothered to average them. > > > > As you can see the first run allocation time was significantly better > > your experience, I didn't know your machine spec but assumed that it must > be > > similar since the second run results are similar. I also didn't see any > > extended GC time, even if I replaced the #collectGarbage with a #compact, > > though doing that did mean that the subsequent run figures were not much > > faster than the first on the allocation test. Anyway, I thought this must > be > > something massively improved in D6 vs D5 (though I can't for the life of > me > > think what :-)), so I went back to D5 and got these results: > > > > Alloc time: 52363 > > Get (index) time: 411 > > Iterate (do) time: 284 > > Double mul time: 123 > > GC time: 190 > > Overall runtime: 53375 > > > > Alloc time: 1275 > > Get (index) time: 414 > > Iterate (do) time: 288 > > Double mul time: 112 > > GC time: 186 > > Overall runtime: 2285 > > > > I was happy that this coincided with your experience on the initial > > allocation behaviour (though not that D6 was 100mS slower on the > subsequent > > run, even though this is probably just timing variability). > > > > I was still mystified as to the delay you experienced closing the > workspace, > > since this didn't seem to be born out by the forced GC timings (and if > > insert a 'Rectangle primAllInstances size' at the end of the script, > you'll > > see that those Rectangles really have been collected). So I thought I'd > try > > out doing as you did, and simply closing the workspace leaving the > variables > > to be collected at idle time. To my surprise I experienced exactly the > same > > lengthy freeze. I didn't measure its duration, but it was lengthy. I found > > that if I nilled out the workspace variables before closing the workspace, > > that the delay did not occur, so I could only conclude that there is > > something very odd going on in the interaction between the view closing > and > > activities of the garbage collector. Obviously this needs to be > > investigated, but I don't think it is a fundamental performance problem in > > the Dolphin collector, as otherwise my other tests would also have shown > > that. > > > > As a point of reference I tried running the script on VWNC7. I had to > change > > the Transcript #display: messages to #show:, and use "ObjectMemory > quickGC" > > in place of "MemoryManager current collectGarbage" (it seemed the nearest > > equivalent), and this is what I got. > > > > Alloc time: 40849 > > Get (index) time: 77 > > Iterate (do) time: 51 > > Double mul time: 327 > > GC time: 116 > > Overall runtime: 41445 > > > > [Subsequent runs were similar] > > > > As you can see, performance on the initial allocation test was poor. I > think > > this is because I either have insufficient memory to run the test in VW, > or > > (more likely) the default memory policy/configuration is not appropriate > for > > this test. Certainly there was an awful lot of flashing up of the GC and > > dustbin cursors when the test was running. So anyway, I don't think it > > really a valid result, and I also think the FP mul figure is questionable > > since once again this was probably over influenced by GC activity: > > > > Anyway Jochen, I believe what has brought us to this point was your > > statement: " Performance is about factor twenty > > lower than Java HotSpot VM, ..." On this test at least, that would appear > be > > FUD, right? :-) > > > > [Frankly, though, I think you really need some more "macro" benchmarks, > i.e. > > closer to an actual application, to draw any real performance conclusions] > > > > Regards > > > > Blair > > > > > > > > > |
In reply to this post by Blair McGlashan
"Blair McGlashan" <[hidden email]> wrote in message
news:b334s0$1i941s$[hidden email]... > Incidentally if you have installed PL2, you can easily view a list of all > accelerator keys in the browsers by choosing the 'Key Bindings' command on > the common Help menu. I've installed PL2 and PL3, but I don't see that menu choice (and I really could use some keyboard shortcuts!). |
Hi Mark,
> I've installed PL2 and PL3, but I don't see that menu choice (and I > really could use some keyboard shortcuts!). Looks like this patch needs initialising before it works in anything other than the Class or System Browser. Evaluate the following in a workspace and all (?) of the Help menus in browsers should gain a "Key Bindings" entry. OAIDEExtensions initialize -- Ian |
"Ian Bartholomew" <[hidden email]> wrote in message
news:BAfda.1257$[hidden email]... > > > I've installed PL2 and PL3, but I don't see that menu choice (and I > > really could use some keyboard shortcuts!). > > Looks like this patch needs initialising before it works in anything > other than the Class or System Browser. I don't have the menu choice there, either. > Evaluate the following in a workspace and all (?) of the Help menus in > browsers should gain a "Key Bindings" entry. > > OAIDEExtensions initialize I don't have that class. Can it be that I don't have PL3? About Dolphin Smalltalk shows 5.0.3. Anyway, I think another message from Blair or Andy says this was actually a V6 feature. |
Mark,
>> OAIDEExtensions initialize > > I don't have that class. Can it be that I don't have PL3? About > Dolphin Smalltalk shows 5.0.3. You should have. It was introduced with Dolphin 5.0.0 to provide a way of hooking into a browser as it opened. It enables you to add additional functionality to existing tools without having to modify any existing resources. FWIW - If I cut/paste the above text into my 5.0.3 it evaluates without a problem (I even spelt initialise the (in)correct way - something I often forget :-) ) > Anyway, I think another message from Blair or Andy says this was > actually a V6 feature. It works (once I reinitialised the class as above) in my 5.0.3. All of the browsers now have a "Key Bindings" option on the help menu that generates a html page containing the key bindings for the current tool and opens up your html browser on the result. -- Ian |
In reply to this post by Mark Wilden
"Mark Wilden" <[hidden email]> wrote in message
news:[hidden email]... > "Ian Bartholomew" <[hidden email]> wrote in message > news:BAfda.1257$[hidden email]... > >.... > > OAIDEExtensions initialize > > I don't have that class. .... Are you sure? Its a member of the 'Dolphin IDE Extension Example' package, which is located in the 'Object Arts\Samples\IDE\' folder. Have you perhaps uninstalled all the samples? > ... > Anyway, I think another message from Blair or Andy says this was actually a > V6 feature. It was originally, but we provided a version of it in PL2. Regards Blair |
In reply to this post by Mark Wilden
Mark,
> I don't have that class. Can it be that I don't have PL3? About > Dolphin Smalltalk shows 5.0.3. Ahh, a light bulb has just switched on. I checked the patch list and it is shown as patch DSE #975 - for Dolphin Standard and Pro only. I guess you are using DVE? I don't know why it wasn't included in the DVE - it seems that it would be most applicable there. Possibly because something in the list creation process needs DSE or DPro? > Anyway, I think another message from Blair or Andy says this was > actually a V6 feature. That bit appears to just be an enhancement where D6 will open the list in it's own browser rather than spawning your default internet browser. -- Ian |
In reply to this post by Ian Bartholomew-18
"Ian Bartholomew" <[hidden email]> wrote in message
news:jQgda.603$[hidden email]... > > > > I don't have that class. Can it be that I don't have PL3? About > > Dolphin Smalltalk shows 5.0.3. > > You should have. It was introduced with Dolphin 5.0.0 to provide a way > of hooking into a browser as it opened. It enables you to add > additional functionality to existing tools without having to modify any > existing resources. Okey-doke. I'd uninstalled that package (temporarily) along with the rest of the Samples (as you thought, Blair). I put it back and evaluated OAIDEExtensions initialize, but I still don't get any darn Help menu choice for key bindings in a CHB. :( |
In reply to this post by Ian Bartholomew-18
"Ian Bartholomew" <[hidden email]> wrote in message
news:adhda.1277$[hidden email]... > Mark, > > > I don't have that class. Can it be that I don't have PL3? About > > Dolphin Smalltalk shows 5.0.3. > > Ahh, a light bulb has just switched on. I checked the patch list and it > is shown as patch DSE #975 - for Dolphin Standard and Pro only. I guess > you are using DVE? No, I'm using the downloaded version of Pro. All the OAIDEExtensions class does that I can see is add a class comment pane context menu choice Emit Class Layout Description which does nothing that I can see. |
"Mark Wilden" <[hidden email]> wrote in message
news:[hidden email]... > a class comment pane context menu choice Emit > Class Layout Description which does nothing that I can see. I take that back--I can choose a protocol for each instance variable to put in the comment. |
In reply to this post by Mark Wilden
"Mark Wilden" <[hidden email]> wrote in message
news:[hidden email]... > "Ian Bartholomew" <[hidden email]> wrote in message > news:jQgda.603$[hidden email]... > > > > > > I don't have that class. Can it be that I don't have PL3? About > > > Dolphin Smalltalk shows 5.0.3. > > > > You should have. It was introduced with Dolphin 5.0.0 to provide a way > > of hooking into a browser as it opened. It enables you to add > > additional functionality to existing tools without having to modify any > > existing resources. > > Okey-doke. I'd uninstalled that package (temporarily) along with the rest > the Samples (as you thought, Blair). I put it back and evaluated > OAIDEExtensions initialize, but I still don't get any darn Help menu choice > for key bindings in a CHB. :( Well of course if you reload the package after installing the patch, then you won't get any of the patched methods but the originals. We recommend patching a freshly installed image to avoid problems like this - also look out for errors on the Transcript when applying a patch, there shouldn't be any. We will be releasing a complete new download of 5.0.4 or 5.1 soon, this week if all goes to plan. In the meantime you may want to try and repeat the patch process against a freshly installed image. First package and save your existing work, and also save the existing image to a new name as a backup. Then if you find anything missing you can still load up the old image, or use Ian's excellent Chunk Browser tool to pull it from the old change log. Please follow these steps exactly when installing the patches: 1) Start LiveUpdate and wait for it to download the patches. 2) Sort the list of patches until in ascending order (click the column header) 3) Select only Patch Level 2, and apply it. 4) Exit and restart LiveUpdate. 5) Select and apply patch Level 3. 6) Save the image, and save another copy as a baseline 5.0.3 for later use. These steps are necessary because of an issue with LiveUpdate itself, corrected by PL2. Another (advanced) alternative would be to open the change log, locate the chunks relating to #975, and file them in. Regards Blair |
"Blair McGlashan" <[hidden email]> wrote in message
news:b549s6$24pcnk$[hidden email]... > > Well of course if you reload the package after installing the patch, then > you won't get any of the patched methods but the originals. Ah, of course. :) > Another (advanced) alternative would be to open the change log, locate the > chunks relating to #975, and file them in. It was as simple as that! Now I have my Key Bindings menu selection. Thanks, Blair & Ian--now it's time for all us ex-pat British-Canadian-Californians to hit the hay. :) BTW, the only evaluation question I had concerning purchase of Dolphin was "how active/current is the support community?" The question has been answered. Thanks again, guys. |
Free forum by Nabble | Edit this page |